coef_ and intercept_ in MultinomialNB is pointless for binary classification #7233

luoq · 2016-08-24T08:36:29Z

MultinomialNB inherits coef_, intercept_ from BaseDiscreteNB defined by

    def _get_coef(self):
        return (self.feature_log_prob_[1:]
                if len(self.classes_) == 2 else self.feature_log_prob_)

    def _get_intercept(self):
        return (self.class_log_prior_[1:]
                if len(self.classes_) == 2 else self.class_log_prior_)

    coef_ = property(_get_coef)
    intercept_ = property(_get_intercept)

For binary classification, in order to be consistent with LinearClassifierMixin, the coef_(intercept_) should be the difference between second and first rows(elements)

amueller · 2016-08-24T14:38:39Z

The NBs are not consistent with LinearClassifierMixin IIRC.
I don't think I understand your comment on what they should be instead.

luoq · 2016-08-25T02:01:20Z

The predict_proba method of MultinomialNB is the same as softmax(feature_log_prob_*X+class_log_prior_).

For binary classification, in LinearClassifierMixin, coef_ is of shape (1, n_feature) and intercept has length 1, we use p1=logistic(coef_*X+intercept_) as probability for class 1 (self.classes_[1]) and 1-p1 for class 0(self.classes_[0]).

Since exp(s1)/(exp(s0)+exp(s1)) = 1/(1+exp(-(s1-s0))=logistic(s1-s0). We could use the difference between second and first rows(elements) in coef_(intercept_) to get probability, and the predict_proba method is the same as softmax logistic regression. We cannot write predict_prob with only the second row(element)

amueller · 2016-08-25T17:39:01Z

Sorry I still don't know what you mean by "row". Row in what?

And I'm not sure what you are saying. Are you saying the results of MultinomialNB.predict_proba are wrong? If so, can you give a self-contained example of that?

I'm not sure how any of this relates to LinearClassifierMixin. It is not used anywhere in MultinomialNB.

luoq · 2016-08-26T01:48:36Z

self.feature_log_prob_ is a 2d array, row means row of self.feature_log_prob_.

predict_proba is correct; it use feature_log_prob_ and class_log_prior_ directly.

MultinomialNB is a linear classifier though it has no direct relation with LinearClassifierMixin in code. In principle we could reuse the predict methods of softmax logistic regression if set up coef_ and intercept_ as follow

    def _get_coef(self):
        return (self.feature_log_prob_[1:]-self.feature_log_prob_[:1]
                if len(self.classes_) == 2 else self.feature_log_prob_)

    def _get_intercept(self):
        return (self.class_log_prior_[1:]-self.class_log_prior_[:1]
                if len(self.classes_) == 2 else self.class_log_prior_)

    coef_ = property(_get_coef)
    intercept_ = property(_get_intercept)

Otherwise, the reason for setting up coef_ and intercept_ for MultinomialNB is strange. It seems they are not used anywhere.

luoq · 2016-08-26T02:28:44Z

Maybe we should remove these code in BaseDiscreteNB. They are quite pointless for BernoulliNB.

Though BernoulliNB has linear separation boundaries and can be adapted to be consistent with softmax logistic regression( reuse predict methods), the setting of coef_, intercept_ is quite different.

luoq · 2016-08-26T02:29:38Z

This note in MultinomialNB may clarify my point regarding linear classifier

    Notes
    -----
    For the rationale behind the names `coef_` and `intercept_`, i.e.
    naive Bayes as a linear classifier, see J. Rennie et al. (2003),
    Tackling the poor assumptions of naive Bayes text classifiers, ICML.

luoq · 2016-08-26T02:39:41Z

self.coef_ is only used at line 507

        elif n_features != self.coef_.shape[1]:
            msg = "Number of features %d does not match previous data %d."
            raise ValueError(msg % (n_features, self.coef_.shape[-1]))

We can use feature_log_prob_ instead

luoq · 2016-08-26T02:50:55Z

these changes are introduced in de16cb4. Note the comment

    # XXX The following is a stopgap measure; we need to set the dimensions
    # of class_log_prior_ and feature_log_prob_ correctly.

It seems they are left there since then

The reason to change dimension for binary classification is unclear. Maybe to simulate coef_ in LogisticRegression?

luoq · 2016-08-26T03:05:57Z

We can just remove coef_, intercept_ for BaseDiscreteNB and forget about treating "naive Bayes as a linear classifier".

Or we can fix coef_, intercept_ for MultinomialNB and write correct coef_, intercept_ for BernoulliNB. This is helpful sometimes

amueller · 2016-08-26T16:14:25Z

Ok, so you're talking about this: #2237

amueller · 2016-08-26T16:15:22Z

the problem with this is that changing behavior like this might break people's code...

luoq · 2016-08-27T02:19:25Z

duplicated with #2237

luoq mentioned this issue Aug 26, 2016

fix bad coef_, intercept_ when interpreting discrete naive bayes as linear model #7253

Closed

luoq closed this as completed Aug 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

coef_ and intercept_ in MultinomialNB is pointless for binary classification #7233

coef_ and intercept_ in MultinomialNB is pointless for binary classification #7233

luoq commented Aug 24, 2016

amueller commented Aug 24, 2016

luoq commented Aug 25, 2016

amueller commented Aug 25, 2016

luoq commented Aug 26, 2016

luoq commented Aug 26, 2016

luoq commented Aug 26, 2016 •

edited

luoq commented Aug 26, 2016

luoq commented Aug 26, 2016 •

edited

luoq commented Aug 26, 2016

amueller commented Aug 26, 2016

amueller commented Aug 26, 2016

luoq commented Aug 27, 2016

coef_ and intercept_ in MultinomialNB is pointless for binary classification #7233

coef_ and intercept_ in MultinomialNB is pointless for binary classification #7233

Comments

luoq commented Aug 24, 2016

amueller commented Aug 24, 2016

luoq commented Aug 25, 2016

amueller commented Aug 25, 2016

luoq commented Aug 26, 2016

luoq commented Aug 26, 2016

luoq commented Aug 26, 2016 • edited

luoq commented Aug 26, 2016

luoq commented Aug 26, 2016 • edited

luoq commented Aug 26, 2016

amueller commented Aug 26, 2016

amueller commented Aug 26, 2016

luoq commented Aug 27, 2016

luoq commented Aug 26, 2016 •

edited

luoq commented Aug 26, 2016 •

edited