Q1. What is Gradient Boosting Regression?

Gradient boosting regression is a machine learning algorithm that combines multiple weak learners to create a strong learner. A weak learner is a model that is only slightly better than random guessing. By combining multiple weak learners, gradient boosting regression can create a model that is much more accurate than any individual weak learner.

Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a
simple regression problem as an example and train the model on a small dataset. Evaluate the model's
performance using metrics such as mean squared error and R-squared.

In [57]:
from sklearn.datasets import make_regression
X,y = make_regression(n_samples=100, n_features=4,n_targets=1)

In [59]:
y

array([-1.01130564e+02, -1.48815101e+02, -2.11758871e+01, -2.10956680e+02,
       -4.91657706e+01, -2.56322325e+01, -1.07314499e+02,  4.30424458e+01,
       -8.40626990e+01, -1.40445533e+02, -4.49963088e+01, -1.95261075e+01,
        2.17417791e+02, -2.96020611e+01, -5.34976148e+01,  1.97361644e+02,
       -2.39022038e+01,  8.96173602e+01, -3.06040417e+01, -1.41563633e+02,
       -5.86527527e+01, -2.16346072e+02,  1.81852799e+02, -2.09848836e+02,
       -9.95472600e-02, -1.01805194e+02,  8.93906320e+00,  1.85470958e+02,
        5.19965180e+01, -1.96716557e+02, -5.13465205e+00, -1.21693724e+02,
       -6.14259922e+01,  7.15624334e+01,  1.90636394e+02,  3.07438012e+00,
       -2.48726544e+01, -9.14929067e+01,  1.28124250e+02, -5.15972992e+01,
        3.75048946e+01,  6.34553808e+01, -8.63540750e+01, -1.16347234e+02,
       -2.72372832e+01, -1.53569157e+01,  1.64756869e+01, -1.17022698e+01,
       -1.24170421e+02,  1.12346731e+02, -1.25590631e+02, -1.34698733e+02,
        1.46830633e+02, -

In [60]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3, random_state=42)

In [61]:
from sklearn.ensemble import GradientBoostingRegressor
regressor = GradientBoostingRegressor()
regressor.fit(X_train,y_train)

In [62]:
y_pred = regressor.predict(X_test)

In [63]:
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
print(f"mse: {mean_squared_error(y_pred, y_test)}")
print(f"mae: {mean_absolute_error(y_pred, y_test)}")
print(f"R_squared: {r2_score(y_pred, y_test)}")

mse: 1330.836486283171
mae: 28.049259895072016
R_squared: 0.8484242661440771


Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to
optimise the performance of the model. Use grid search or random search to find the best
hyperparameters

In [75]:
parameter = {'learning_rate' : [0.1,1,10],
             'n_estimators' : [10,100,200],
             'max_depth': [1,3,5],
             'loss': ['squared_error', 'absolute_error', 'quantile'],
}

In [76]:
from sklearn.model_selection import GridSearchCV
grid = GridSearchCV(regressor, param_grid=parameter,refit=True,cv=3,verbose=3)

In [77]:
grid.fit(X_train, y_train)

Fitting 3 folds for each of 81 candidates, totalling 243 fits
[CV 1/3] END learning_rate=0.1, loss=squared_error, max_depth=1, n_estimators=10;, score=0.339 total time=   0.0s
[CV 2/3] END learning_rate=0.1, loss=squared_error, max_depth=1, n_estimators=10;, score=0.470 total time=   0.0s
[CV 3/3] END learning_rate=0.1, loss=squared_error, max_depth=1, n_estimators=10;, score=0.360 total time=   0.0s
[CV 1/3] END learning_rate=0.1, loss=squared_error, max_depth=1, n_estimators=100;, score=0.829 total time=   0.0s
[CV 2/3] END learning_rate=0.1, loss=squared_error, max_depth=1, n_estimators=100;, score=0.864 total time=   0.0s
[CV 3/3] END learning_rate=0.1, loss=squared_error, max_depth=1, n_estimators=100;, score=0.854 total time=   0.0s
[CV 1/3] END learning_rate=0.1, loss=squared_error, max_depth=1, n_estimators=200;, score=0.881 total time=   0.0s
[CV 2/3] END learning_rate=0.1, loss=squared_error, max_depth=1, n_estimators=200;, score=0.902 total time=   0.0s
[CV 3/3] END learning

[CV 1/3] END learning_rate=0.1, loss=quantile, max_depth=5, n_estimators=100;, score=0.738 total time=   0.2s
[CV 2/3] END learning_rate=0.1, loss=quantile, max_depth=5, n_estimators=100;, score=0.134 total time=   0.2s
[CV 3/3] END learning_rate=0.1, loss=quantile, max_depth=5, n_estimators=100;, score=0.457 total time=   0.1s
[CV 1/3] END learning_rate=0.1, loss=quantile, max_depth=5, n_estimators=200;, score=0.771 total time=   0.3s
[CV 2/3] END learning_rate=0.1, loss=quantile, max_depth=5, n_estimators=200;, score=0.394 total time=   0.3s
[CV 3/3] END learning_rate=0.1, loss=quantile, max_depth=5, n_estimators=200;, score=0.293 total time=   0.5s
[CV 1/3] END learning_rate=1, loss=squared_error, max_depth=1, n_estimators=10;, score=0.819 total time=   0.0s
[CV 2/3] END learning_rate=1, loss=squared_error, max_depth=1, n_estimators=10;, score=0.689 total time=   0.0s
[CV 3/3] END learning_rate=1, loss=squared_error, max_depth=1, n_estimators=10;, score=0.837 total time=   0.0s
[CV 

[CV 3/3] END learning_rate=1, loss=quantile, max_depth=3, n_estimators=100;, score=-0.397 total time=   0.0s
[CV 1/3] END learning_rate=1, loss=quantile, max_depth=3, n_estimators=200;, score=0.225 total time=   0.0s
[CV 2/3] END learning_rate=1, loss=quantile, max_depth=3, n_estimators=200;, score=-1.078 total time=   0.0s
[CV 3/3] END learning_rate=1, loss=quantile, max_depth=3, n_estimators=200;, score=-0.653 total time=   0.0s
[CV 1/3] END learning_rate=1, loss=quantile, max_depth=5, n_estimators=10;, score=-0.306 total time=   0.0s
[CV 2/3] END learning_rate=1, loss=quantile, max_depth=5, n_estimators=10;, score=-1.553 total time=   0.0s
[CV 3/3] END learning_rate=1, loss=quantile, max_depth=5, n_estimators=10;, score=-0.637 total time=   0.0s
[CV 1/3] END learning_rate=1, loss=quantile, max_depth=5, n_estimators=100;, score=0.209 total time=   0.0s
[CV 2/3] END learning_rate=1, loss=quantile, max_depth=5, n_estimators=100;, score=-1.557 total time=   0.0s
[CV 3/3] END learning_ra

  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predic

[CV 1/3] END learning_rate=10, loss=squared_error, max_depth=1, n_estimators=200;, score=-inf total time=   0.0s
[CV 2/3] END learning_rate=10, loss=squared_error, max_depth=1, n_estimators=200;, score=-inf total time=   0.0s


  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight 

[CV 3/3] END learning_rate=10, loss=squared_error, max_depth=1, n_estimators=200;, score=-inf total time=   0.0s
[CV 1/3] END learning_rate=10, loss=squared_error, max_depth=3, n_estimators=10;, score=-9976019311415533568.000 total time=   0.0s
[CV 2/3] END learning_rate=10, loss=squared_error, max_depth=3, n_estimators=10;, score=-11205857877311731712.000 total time=   0.0s
[CV 3/3] END learning_rate=10, loss=squared_error, max_depth=3, n_estimators=10;, score=-7828144310874687488.000 total time=   0.0s
[CV 1/3] END learning_rate=10, loss=squared_error, max_depth=3, n_estimators=100;, score=-57890723897274898960295363421873424596088301360291368424846037042338928370351474753813879376260376097154541707416434765616592512594587526963019800786126365454267729738049049698012200022573056.000 total time=   0.0s
[CV 2/3] END learning_rate=10, loss=squared_error, max_depth=3, n_estimators=100;, score=-65020744789148192933793874004298590795940979108163288990723263659413159300334411346343451716122

  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y 

[CV 2/3] END learning_rate=10, loss=squared_error, max_depth=3, n_estimators=200;, score=-inf total time=   0.0s
[CV 3/3] END learning_rate=10, loss=squared_error, max_depth=3, n_estimators=200;, score=-inf total time=   0.0s
[CV 1/3] END learning_rate=10, loss=squared_error, max_depth=5, n_estimators=10;, score=-13061592056528494592.000 total time=   0.0s
[CV 2/3] END learning_rate=10, loss=squared_error, max_depth=5, n_estimators=10;, score=-15861253336633757696.000 total time=   0.0s
[CV 3/3] END learning_rate=10, loss=squared_error, max_depth=5, n_estimators=10;, score=-8784801796443547648.000 total time=   0.0s
[CV 1/3] END learning_rate=10, loss=squared_error, max_depth=5, n_estimators=100;, score=-74040083048631910306231912305403150019748445799404972488530442060775138412116100801233850322179079427498699314476186043496555966754421867267265427737842426075082147593444594414898470021234688.000 total time=   0.0s


  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predic

[CV 2/3] END learning_rate=10, loss=squared_error, max_depth=5, n_estimators=100;, score=-98449010225605742927523146960217776666713874426178440158795432466976574001064691621804975811543014960083446437965832010547939209912985762462543708503596226420371646668975921675848132065230848.000 total time=   0.0s
[CV 3/3] END learning_rate=10, loss=squared_error, max_depth=5, n_estimators=100;, score=-50406102105682027966184128485709464895938043779725601225349188809698646500469965823016184077573098403266126189744597687690149655211413298925800374222373140874511121638029758567400404972208128.000 total time=   0.0s


  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_

[CV 1/3] END learning_rate=10, loss=squared_error, max_depth=5, n_estimators=200;, score=-inf total time=   0.0s
[CV 2/3] END learning_rate=10, loss=squared_error, max_depth=5, n_estimators=200;, score=-inf total time=   0.0s


  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y - raw_predictions.ravel()) ** 2))
  * np.sum(sample_weight * ((y 

[CV 3/3] END learning_rate=10, loss=squared_error, max_depth=5, n_estimators=200;, score=-inf total time=   0.0s
[CV 1/3] END learning_rate=10, loss=absolute_error, max_depth=1, n_estimators=10;, score=-6935912147403289600.000 total time=   0.0s
[CV 2/3] END learning_rate=10, loss=absolute_error, max_depth=1, n_estimators=10;, score=-10820234519514120192.000 total time=   0.0s
[CV 3/3] END learning_rate=10, loss=absolute_error, max_depth=1, n_estimators=10;, score=-2369851174477310976.000 total time=   0.0s
[CV 1/3] END learning_rate=10, loss=absolute_error, max_depth=1, n_estimators=100;, score=-40249017423896874240655727660800913542107683023561328229604688509471295505772850980034882120753672373393062710919418543465561519870968019735565567636594414692880208577567339567572991277531136.000 total time=   0.0s
[CV 2/3] END learning_rate=10, loss=absolute_error, max_depth=1, n_estimators=100;, score=-627896949307764606701586450966928686267897526945761307067366522246708063135440743716376624

  numerator = (weight * (y_true - y_pred) ** 2).sum(axis=0, dtype=np.float64)


[CV 1/3] END learning_rate=10, loss=absolute_error, max_depth=1, n_estimators=200;, score=-inf total time=   0.1s


  numerator = (weight * (y_true - y_pred) ** 2).sum(axis=0, dtype=np.float64)


[CV 2/3] END learning_rate=10, loss=absolute_error, max_depth=1, n_estimators=200;, score=-inf total time=   0.1s


  numerator = (weight * (y_true - y_pred) ** 2).sum(axis=0, dtype=np.float64)


[CV 3/3] END learning_rate=10, loss=absolute_error, max_depth=1, n_estimators=200;, score=-inf total time=   0.1s
[CV 1/3] END learning_rate=10, loss=absolute_error, max_depth=3, n_estimators=10;, score=-6458889778774497280.000 total time=   0.0s
[CV 2/3] END learning_rate=10, loss=absolute_error, max_depth=3, n_estimators=10;, score=-12546681998271139840.000 total time=   0.0s
[CV 3/3] END learning_rate=10, loss=absolute_error, max_depth=3, n_estimators=10;, score=-4220516511122179072.000 total time=   0.0s
[CV 1/3] END learning_rate=10, loss=absolute_error, max_depth=3, n_estimators=100;, score=-37480862164933194360728801654356258521165103110575704436304469898528175334167708582673376338225504289869095820362107321228954328693694701123848818269548225143269340999616204276016692898824192.000 total time=   0.1s
[CV 2/3] END learning_rate=10, loss=absolute_error, max_depth=3, n_estimators=100;, score=-72808249549476258228882874173297812198879839680917542750417967238871451571098890118453476

  numerator = (weight * (y_true - y_pred) ** 2).sum(axis=0, dtype=np.float64)


[CV 1/3] END learning_rate=10, loss=absolute_error, max_depth=3, n_estimators=200;, score=-inf total time=   0.2s


  numerator = (weight * (y_true - y_pred) ** 2).sum(axis=0, dtype=np.float64)


[CV 2/3] END learning_rate=10, loss=absolute_error, max_depth=3, n_estimators=200;, score=-inf total time=   0.2s


  numerator = (weight * (y_true - y_pred) ** 2).sum(axis=0, dtype=np.float64)


[CV 3/3] END learning_rate=10, loss=absolute_error, max_depth=3, n_estimators=200;, score=-inf total time=   0.3s
[CV 1/3] END learning_rate=10, loss=absolute_error, max_depth=5, n_estimators=10;, score=-6979614572936300544.000 total time=   0.0s
[CV 2/3] END learning_rate=10, loss=absolute_error, max_depth=5, n_estimators=10;, score=-12978753621715736576.000 total time=   0.0s
[CV 3/3] END learning_rate=10, loss=absolute_error, max_depth=5, n_estimators=10;, score=-4550690966023083520.000 total time=   0.0s
[CV 1/3] END learning_rate=10, loss=absolute_error, max_depth=5, n_estimators=100;, score=-40508108581424631757396090615138262239644366742400361323550196961991894972779662540418619196865721624260079304037927109548660258434707681814748020013741769224231249541460294857520144063660032.000 total time=   0.1s
[CV 2/3] END learning_rate=10, loss=absolute_error, max_depth=5, n_estimators=100;, score=-75303442119678830255171895668605667017079203288495917127765357598807393643857994539265784

  numerator = (weight * (y_true - y_pred) ** 2).sum(axis=0, dtype=np.float64)


[CV 1/3] END learning_rate=10, loss=absolute_error, max_depth=5, n_estimators=200;, score=-inf total time=   0.3s


  numerator = (weight * (y_true - y_pred) ** 2).sum(axis=0, dtype=np.float64)


[CV 2/3] END learning_rate=10, loss=absolute_error, max_depth=5, n_estimators=200;, score=-inf total time=   0.3s


  numerator = (weight * (y_true - y_pred) ** 2).sum(axis=0, dtype=np.float64)


[CV 3/3] END learning_rate=10, loss=absolute_error, max_depth=5, n_estimators=200;, score=-inf total time=   0.3s
[CV 1/3] END learning_rate=10, loss=quantile, max_depth=1, n_estimators=10;, score=-2897644219680737280.000 total time=   0.0s
[CV 2/3] END learning_rate=10, loss=quantile, max_depth=1, n_estimators=10;, score=-31485477365132144.000 total time=   0.0s
[CV 3/3] END learning_rate=10, loss=quantile, max_depth=1, n_estimators=10;, score=-1320141821251797504.000 total time=   0.0s
[CV 1/3] END learning_rate=10, loss=quantile, max_depth=1, n_estimators=100;, score=-16814995651000145931145386000452724291242051641158637770381654429499494557310016595005899044304041096398776976031700708513491001398602829261959216629332159185695323988622921787545922526248960.000 total time=   0.0s
[CV 2/3] END learning_rate=10, loss=quantile, max_depth=1, n_estimators=100;, score=-18196541814845962403480513111609654153975780072166623397746483271036718349531821043601355433978733067671564214027142740615

  numerator = (weight * (y_true - y_pred) ** 2).sum(axis=0, dtype=np.float64)
  numerator = (weight * (y_true - y_pred) ** 2).sum(axis=0, dtype=np.float64)


[CV 2/3] END learning_rate=10, loss=quantile, max_depth=1, n_estimators=200;, score=-inf total time=   0.1s
[CV 3/3] END learning_rate=10, loss=quantile, max_depth=1, n_estimators=200;, score=-inf total time=   0.1s


  numerator = (weight * (y_true - y_pred) ** 2).sum(axis=0, dtype=np.float64)


[CV 1/3] END learning_rate=10, loss=quantile, max_depth=3, n_estimators=10;, score=-3297640036726853632.000 total time=   0.0s
[CV 2/3] END learning_rate=10, loss=quantile, max_depth=3, n_estimators=10;, score=-8968589107717640192.000 total time=   0.0s
[CV 3/3] END learning_rate=10, loss=quantile, max_depth=3, n_estimators=10;, score=-3766104101360928768.000 total time=   0.0s
[CV 1/3] END learning_rate=10, loss=quantile, max_depth=3, n_estimators=100;, score=-17077643543530542515220912409617584475434435501032031991705351063227246177170438798865427946791764718615850719558282788454433712596153653312580132463516923503904618191734195002083540192985088.000 total time=   0.1s
[CV 2/3] END learning_rate=10, loss=quantile, max_depth=3, n_estimators=100;, score=-62042637072115299990441347454708138709568592323139470742608869364691662858614885515183902864086459891084536542179896872655767368682616354545665812735528138885967017197816736328184253948887040.000 total time=   0.1s
[CV 3/3] END learni

  numerator = (weight * (y_true - y_pred) ** 2).sum(axis=0, dtype=np.float64)


[CV 1/3] END learning_rate=10, loss=quantile, max_depth=3, n_estimators=200;, score=-inf total time=   0.3s


  numerator = (weight * (y_true - y_pred) ** 2).sum(axis=0, dtype=np.float64)


[CV 2/3] END learning_rate=10, loss=quantile, max_depth=3, n_estimators=200;, score=-inf total time=   0.2s


  numerator = (weight * (y_true - y_pred) ** 2).sum(axis=0, dtype=np.float64)


[CV 3/3] END learning_rate=10, loss=quantile, max_depth=3, n_estimators=200;, score=-inf total time=   0.2s
[CV 1/3] END learning_rate=10, loss=quantile, max_depth=5, n_estimators=10;, score=-3097014002955383808.000 total time=   0.0s
[CV 2/3] END learning_rate=10, loss=quantile, max_depth=5, n_estimators=10;, score=-9807992422479833088.000 total time=   0.0s
[CV 3/3] END learning_rate=10, loss=quantile, max_depth=5, n_estimators=10;, score=-2699897253005186048.000 total time=   0.0s
[CV 1/3] END learning_rate=10, loss=quantile, max_depth=5, n_estimators=100;, score=-24160739888396180507634869769327373326417545652989926275904172616082510074653585939944136127086872462150856327913896012536028339757551202672011421474868215759959871314099478814213356161859584.000 total time=   0.1s
[CV 2/3] END learning_rate=10, loss=quantile, max_depth=5, n_estimators=100;, score=-568536861541047705250558120287766535107841448110609634131415258896240248521346507038182357777571794933451803522430141483638179

  numerator = (weight * (y_true - y_pred) ** 2).sum(axis=0, dtype=np.float64)


[CV 1/3] END learning_rate=10, loss=quantile, max_depth=5, n_estimators=200;, score=-inf total time=   0.3s


  numerator = (weight * (y_true - y_pred) ** 2).sum(axis=0, dtype=np.float64)


[CV 2/3] END learning_rate=10, loss=quantile, max_depth=5, n_estimators=200;, score=-inf total time=   0.3s
[CV 3/3] END learning_rate=10, loss=quantile, max_depth=5, n_estimators=200;, score=-inf total time=   0.3s


  numerator = (weight * (y_true - y_pred) ** 2).sum(axis=0, dtype=np.float64)
  8.67071442e-001  8.68789275e-001  6.65584578e-001  8.14683828e-001
  8.21000797e-001  3.21298710e-001  7.93329415e-001  8.68371783e-001
  5.63175467e-001  8.70608604e-001  8.79242047e-001  5.72837509e-001
  8.79582534e-001  8.72184198e-001 -1.86542451e+000 -6.88668283e-001
 -6.16310759e-001 -1.22159230e+000  4.42598361e-002  5.36072372e-001
 -1.15980600e+000  4.43316827e-001  4.86370422e-001  7.81610486e-001
  8.35876385e-001  8.37095851e-001  6.92048639e-001  6.81592263e-001
  6.92440652e-001  6.71672528e-001  7.58666819e-001  7.22658059e-001
  7.75647273e-001  8.60624353e-001  8.52102850e-001  6.08631055e-001
  6.66822861e-001  6.40863868e-001  6.46975759e-001  6.75585935e-001
  6.81347742e-001 -2.62644828e-001 -5.09155434e-002 -7.11237430e-002
 -5.02680507e-001 -5.38396003e-001 -5.02352202e-001 -8.31989217e-001
 -7.54046367e-001 -6.48848115e-001 -8.09099645e+018 -4.69519582e+190
             -inf -9.6700

In [78]:
y_pred_gcv = grid.predict(X_test)

In [79]:
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
print(f"mse: {mean_squared_error(y_pred_gcv, y_test)}")
print(f"mae: {mean_absolute_error(y_pred_gcv, y_test)}")
print(f"R_squared: {r2_score(y_pred_gcv, y_test)}")

mse: 702.3739148442303
mae: 20.674679296077308
R_squared: 0.9286659907976517


In [82]:
print(f"Best hyperparameters using RadomsearchCV are: {grid.best_params_}")

Best hyperparameters using RadomsearchCV are: {'learning_rate': 0.1, 'loss': 'squared_error', 'max_depth': 1, 'n_estimators': 200}


Q4. What is a weak learner in Gradient Boosting?

In the context of gradient boosting, a weak learner is a model that is only slightly better than random guessing. A weak learner is used to fit a simple model to the data. The predictions of the weak learner are then used to correct the errors made by the previous weak learners.

The weak learners in gradient boosting are typically decision trees. Decision trees are a type of machine learning model that can be used for classification or regression tasks. Decision trees work by splitting the data into smaller and smaller subsets until each subset is homogeneous. The predicted value for a new data point is then the value of the majority class in the subset that the new data point belongs to.

Q5. What is the intuition behind the Gradient Boosting algorithm?

The intuition behind the Gradient Boosting algorithm is to build a model that minimizes the loss function by adding weak learners sequentially. The weak learners are trained to correct the errors made by the previous learners. This helps to improve the accuracy of the model over time.

The loss function is a measure of the difference between the predicted values and the actual values. The goal of gradient boosting is to minimize the loss function by adjusting the parameters of the weak learners.

Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?

The gradient boosting algorithm works as follows:

Initialize the weights of all data points to be equal.

Train a weak learner on the weighted data.

Calculate the error of the weak learner.

Update the weights of the data points based on the error of the weak learner.

Repeat steps 2-4 until a desired number of weak learners is trained.


The final model is the weighted sum of the predictions of the weak learners. The weights are typically adjusted after each weak learner is trained to ensure that the weak learners that are more accurate have a greater weight.

Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting
algorithm?

 here are the steps involved in constructing the mathematical intuition of the Gradient Boosting algorithm:

Define the loss function. The loss function is a measure of the difference between the predicted values and the actual values. The goal of gradient boosting is to minimize the loss function.

Choose the weak learners. The weak learners are the models that are used to build the ensemble. They are typically decision trees, but other models can also be used.

Initialize the weights of the data points. The weights of the data points are used to determine how much each data point contributes to the loss function.

Train the first weak learner. The first weak learner is trained on the weighted data.

Calculate the error of the first weak learner. The error of the first weak learner is the difference between the predicted values of the first weak learner and the actual values.

Update the weights of the data points based on the error of the first weak learner. The weights of the data points are updated so that the data points that are misclassified by the first weak learner have a higher weight.

Repeat steps 4-6 until a desired number of weak learners is trained. The final model is the sum of the predictions of the weak learners.