Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coef_ and intercept_ in MultinomialNB is pointless for binary classification #7233

Closed
luoq opened this issue Aug 24, 2016 · 12 comments
Closed

Comments

@luoq
Copy link

luoq commented Aug 24, 2016

MultinomialNB inherits coef_, intercept_ from BaseDiscreteNB defined by

    def _get_coef(self):
        return (self.feature_log_prob_[1:]
                if len(self.classes_) == 2 else self.feature_log_prob_)

    def _get_intercept(self):
        return (self.class_log_prior_[1:]
                if len(self.classes_) == 2 else self.class_log_prior_)

    coef_ = property(_get_coef)
    intercept_ = property(_get_intercept)

For binary classification, in order to be consistent with LinearClassifierMixin, the coef_(intercept_) should be the difference between second and first rows(elements)

@amueller
Copy link
Member

The NBs are not consistent with LinearClassifierMixin IIRC.
I don't think I understand your comment on what they should be instead.

@luoq
Copy link
Author

luoq commented Aug 25, 2016

The predict_proba method of MultinomialNB is the same as softmax(feature_log_prob_*X+class_log_prior_).

For binary classification, in LinearClassifierMixin, coef_ is of shape (1, n_feature) and intercept has length 1, we use p1=logistic(coef_*X+intercept_) as probability for class 1 (self.classes_[1]) and 1-p1 for class 0(self.classes_[0]).

Since exp(s1)/(exp(s0)+exp(s1)) = 1/(1+exp(-(s1-s0))=logistic(s1-s0). We could use the difference between second and first rows(elements) in coef_(intercept_) to get probability, and the predict_proba method is the same as softmax logistic regression. We cannot write predict_prob with only the second row(element)

@amueller
Copy link
Member

Sorry I still don't know what you mean by "row". Row in what?

And I'm not sure what you are saying. Are you saying the results of MultinomialNB.predict_proba are wrong? If so, can you give a self-contained example of that?

I'm not sure how any of this relates to LinearClassifierMixin. It is not used anywhere in MultinomialNB.

@luoq
Copy link
Author

luoq commented Aug 26, 2016

self.feature_log_prob_ is a 2d array, row means row of self.feature_log_prob_.

predict_proba is correct; it use feature_log_prob_ and class_log_prior_ directly.

MultinomialNB is a linear classifier though it has no direct relation with LinearClassifierMixin in code. In principle we could reuse the predict methods of softmax logistic regression if set up coef_ and intercept_ as follow

    def _get_coef(self):
        return (self.feature_log_prob_[1:]-self.feature_log_prob_[:1]
                if len(self.classes_) == 2 else self.feature_log_prob_)

    def _get_intercept(self):
        return (self.class_log_prior_[1:]-self.class_log_prior_[:1]
                if len(self.classes_) == 2 else self.class_log_prior_)

    coef_ = property(_get_coef)
    intercept_ = property(_get_intercept)

Otherwise, the reason for setting up coef_ and intercept_ for MultinomialNB is strange. It seems they are not used anywhere.

@luoq
Copy link
Author

luoq commented Aug 26, 2016

Maybe we should remove these code in BaseDiscreteNB. They are quite pointless for BernoulliNB.

Though BernoulliNB has linear separation boundaries and can be adapted to be consistent with softmax logistic regression( reuse predict methods), the setting of coef_, intercept_ is quite different.

@luoq
Copy link
Author

luoq commented Aug 26, 2016

This note in MultinomialNB may clarify my point regarding linear classifier

    Notes
    -----
    For the rationale behind the names `coef_` and `intercept_`, i.e.
    naive Bayes as a linear classifier, see J. Rennie et al. (2003),
    Tackling the poor assumptions of naive Bayes text classifiers, ICML.

@luoq
Copy link
Author

luoq commented Aug 26, 2016

self.coef_ is only used at line 507

        elif n_features != self.coef_.shape[1]:
            msg = "Number of features %d does not match previous data %d."
            raise ValueError(msg % (n_features, self.coef_.shape[-1]))

We can use feature_log_prob_ instead

@luoq
Copy link
Author

luoq commented Aug 26, 2016

these changes are introduced in de16cb4. Note the comment

    # XXX The following is a stopgap measure; we need to set the dimensions
    # of class_log_prior_ and feature_log_prob_ correctly.

It seems they are left there since then

The reason to change dimension for binary classification is unclear. Maybe to simulate coef_ in LogisticRegression?

@luoq
Copy link
Author

luoq commented Aug 26, 2016

We can just remove coef_, intercept_ for BaseDiscreteNB and forget about treating "naive Bayes as a linear classifier".

Or we can fix coef_, intercept_ for MultinomialNB and write correct coef_, intercept_ for BernoulliNB. This is helpful sometimes

@amueller
Copy link
Member

Ok, so you're talking about this: #2237

@amueller
Copy link
Member

the problem with this is that changing behavior like this might break people's code...

@luoq luoq closed this as completed Aug 27, 2016
@luoq
Copy link
Author

luoq commented Aug 27, 2016

duplicated with #2237

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants