# Errors with the `init` estimator parameter of Gradient Boosting

This is a short notebook that attempts to explore some potential bugs with the `init` parameter of GradientBoosting. I was excited to test using a linear model as the `init` model of gradient boosting so that I could use a simple linear function to capture linear relationships and then "refine" the non-linear parts with Gradient Boosting. I'd love to see that work, so I'm testing what might not be working here.

## Very minimal example

In [10]:
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegressionCV
from sklearn.ensemble import GradientBoostingClassifier

X, y = make_classification(n_samples=1000, n_features=50, n_informative=10,random_state=0)
init_clf = LogisticRegressionCV(cv=3)
clf = GradientBoostingClassifier(init=init_clf)
clf.fit(X, y)

IndexError: too many indices for array

In [11]:
import sklearn; sklearn.show_versions()


System
------
    python: 3.7.0 (default, Jun 28 2018, 13:15:42)  [GCC 7.2.0]
executable: /home/gstoddard/anaconda3/envs/eis_env/bin/python
   machine: Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-redhat-7.5-Maipo

BLAS
----
    macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
  lib_dirs: /home/gstoddard/anaconda3/envs/eis_env/lib
cblas_libs: mkl_rt, pthread

Python deps
-----------
       pip: 18.0
setuptools: 39.2.0
   sklearn: 0.20.0
     numpy: 1.15.0
     scipy: 1.1.0
    Cython: None
    pandas: 0.23.3


In [1]:
from sklearn.linear_model import LassoCV, LogisticRegressionCV
from sklearn.datasets import make_classification
from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor
from sklearn.metrics import accuracy_score, roc_auc_score
import numpy as np 
from sklearn.model_selection import train_test_split

# Errors with standard estimators

This next part shows that the errors are both with `GradientBoostingClassifier` and `GradientBoostingRegressor`

In [12]:
X, y = make_classification(n_samples=1000, n_features=50, n_informative=10,random_state=0)

init_clf = LogisticRegressionCV()
clf = GradientBoostingClassifier(max_depth=3,n_estimators=10,learning_rate=.1,init=init_clf)
clf.fit(X, y)




IndexError: too many indices for array

In [13]:
init_clf = LassoCV()
clf = GradientBoostingRegressor(max_depth=3,n_estimators=10,learning_rate=.1,init=init_clf)
clf.fit(X, y)

TypeError: fit() takes 3 positional arguments but 4 were given

# Errors when trying to get the past the `sample weight` arguments bug

As a test, I wanted to see what would happen if I was able to get past the `sample weight` bug. Let's define a classifier that just ignores things and see how far we can get. Spoiler: We don't get far.

In [23]:
from sklearn.base import BaseEstimator, ClassifierMixin

class InitEstimator(BaseEstimator, ClassifierMixin):  

    def __init__(self, base_clf):

        self.base_clf = base_clf

    def fit(self, X, y=None,sample_weight=None, **fit_params):
        
        self.base_clf.fit(X,y)
        
    def predict(self, X, y=None):

        return self.base_clf.predict(X)


In [25]:
init_clf = InitEstimator(LassoCV(cv=3))
clf = GradientBoostingRegressor(init=init_clf)
clf.fit(X, y)

IndexError: too many indices for array