Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base learner for NGBClassifier #225

Closed
imitusov opened this issue Feb 4, 2021 · 1 comment
Closed

Base learner for NGBClassifier #225

imitusov opened this issue Feb 4, 2021 · 1 comment

Comments

@imitusov
Copy link

imitusov commented Feb 4, 2021

Am I right there is no option of a Base learner for NGBClassifier?

As soon as I pass a classifier Base learner i got the following error. At the same time if i pass learner = RandomForestRegressor(n_estimators=500, max_depth=7, min_samples_leaf=50, n_jobs=-1, max_features="sqrt", max_samples=.66, random_state=42) as a learner everything works just fine
The example was taken from

from ngboost import NGBClassifier
from ngboost.distns import k_categorical, Bernoulli
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

_X, _y = load_breast_cancer(True)
_y[0:15] = 2 # artificially make this a 3-class problem instead of a 2-class problem
X_cls_train, X_cls_test, Y_cls_train, Y_cls_test  = train_test_split(_X, _y, test_size=0.2)
learner = RandomForestClassifier(n_estimators=500, max_depth=7, min_samples_leaf=50, n_jobs=-1,
    max_features="sqrt", max_samples=.66, random_state=42, class_weight='balanced')

ngb_cat = NGBClassifier(Base=learner, Dist=k_categorical(3), verbose=False) # tell ngboost that there are 3 possible outcomes
_ = ngb_cat.fit(X_cls_train, Y_cls_train) # Y should have only 3 values: {0,1,2}
ValueError                                Traceback (most recent call last)
<ipython-input-21-8eb58a515847> in <module>
     10 
     11 ngb_cat = NGBClassifier(Base=learner, Dist=k_categorical(3), verbose=False) # tell ngboost that there are 3 possible outcomes
---> 12 _ = ngb_cat.fit(X_cls_train, Y_cls_train) # Y should have only 3 values: {0,1,2}

~/.local/lib/python3.7/site-packages/ngboost/ngboost.py in fit(self, X, Y, X_val, Y_val, sample_weight, val_sample_weight, train_loss_monitor, val_loss_monitor, early_stopping_rounds)
    255             grads = D.grad(Y_batch, natural=self.natural_gradient)
    256 
--> 257             proj_grad = self.fit_base(X_batch, grads, weight_batch)
    258             scale = self.line_search(proj_grad, P_batch, Y_batch, weight_batch)
    259 

~/.local/lib/python3.7/site-packages/ngboost/ngboost.py in fit_base(self, X, grads, sample_weight)
    139     def fit_base(self, X, grads, sample_weight=None):
    140         models = [
--> 141             clone(self.Base).fit(X, g, sample_weight=sample_weight) for g in grads.T
    142         ]
    143         fitted = np.array([m.predict(X) for m in models]).T

~/.local/lib/python3.7/site-packages/ngboost/ngboost.py in <listcomp>(.0)
    139     def fit_base(self, X, grads, sample_weight=None):
    140         models = [
--> 141             clone(self.Base).fit(X, g, sample_weight=sample_weight) for g in grads.T
    142         ]
    143         fitted = np.array([m.predict(X) for m in models]).T

~/.local/lib/python3.7/site-packages/sklearn/ensemble/_forest.py in fit(self, X, y, sample_weight)
    328         self.n_outputs_ = y.shape[1]
    329 
--> 330         y, expanded_class_weight = self._validate_y_class_weight(y)
    331 
    332         if getattr(y, "dtype", None) != DOUBLE or not y.flags.contiguous:

~/.local/lib/python3.7/site-packages/sklearn/ensemble/_forest.py in _validate_y_class_weight(self, y)
    556 
    557     def _validate_y_class_weight(self, y):
--> 558         check_classification_targets(y)
    559 
    560         y = np.copy(y)

~/.local/lib/python3.7/site-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
    170     if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
    171                       'multilabel-indicator', 'multilabel-sequences']:
--> 172         raise ValueError("Unknown label type: %r" % y_type)
    173 
    174 

ValueError: Unknown label type: 'continuous'
@alejandroschuler
Copy link
Collaborator

alejandroschuler commented Feb 4, 2021

I think your issue is that you're trying to use a classifier as the base learner. Even though the NGBoost model as a whole is doing (probabilistic) classification, the base learner must always be a regression model. That's because the job of the base learners is to help estimate a continuous value, in this case the logits of the class probabilities. This is also the case in other boosting implementations, e.g. an xgboost classifier still uses regression trees as its base learners.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants