-
-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
coef_ and intercept_ in MultinomialNB is pointless for binary classification #7233
Comments
The NBs are not consistent with LinearClassifierMixin IIRC. |
The predict_proba method of MultinomialNB is the same as softmax(feature_log_prob_*X+class_log_prior_). For binary classification, in LinearClassifierMixin, coef_ is of shape (1, n_feature) and intercept has length 1, we use p1=logistic(coef_*X+intercept_) as probability for class 1 (self.classes_[1]) and 1-p1 for class 0(self.classes_[0]). Since exp(s1)/(exp(s0)+exp(s1)) = 1/(1+exp(-(s1-s0))=logistic(s1-s0). We could use the difference between second and first rows(elements) in coef_(intercept_) to get probability, and the predict_proba method is the same as softmax logistic regression. We cannot write predict_prob with only the second row(element) |
Sorry I still don't know what you mean by "row". Row in what? And I'm not sure what you are saying. Are you saying the results of I'm not sure how any of this relates to LinearClassifierMixin. It is not used anywhere in |
self.feature_log_prob_ is a 2d array, row means row of self.feature_log_prob_. predict_proba is correct; it use feature_log_prob_ and class_log_prior_ directly. MultinomialNB is a linear classifier though it has no direct relation with LinearClassifierMixin in code. In principle we could reuse the predict methods of softmax logistic regression if set up coef_ and intercept_ as follow def _get_coef(self):
return (self.feature_log_prob_[1:]-self.feature_log_prob_[:1]
if len(self.classes_) == 2 else self.feature_log_prob_)
def _get_intercept(self):
return (self.class_log_prior_[1:]-self.class_log_prior_[:1]
if len(self.classes_) == 2 else self.class_log_prior_)
coef_ = property(_get_coef)
intercept_ = property(_get_intercept) Otherwise, the reason for setting up coef_ and intercept_ for MultinomialNB is strange. It seems they are not used anywhere. |
Maybe we should remove these code in BaseDiscreteNB. They are quite pointless for BernoulliNB. Though BernoulliNB has linear separation boundaries and can be adapted to be consistent with softmax logistic regression( reuse predict methods), the setting of coef_, intercept_ is quite different. |
This note in MultinomialNB may clarify my point regarding linear classifier
|
self.coef_ is only used at line 507 elif n_features != self.coef_.shape[1]:
msg = "Number of features %d does not match previous data %d."
raise ValueError(msg % (n_features, self.coef_.shape[-1])) We can use feature_log_prob_ instead |
these changes are introduced in de16cb4. Note the comment
It seems they are left there since then The reason to change dimension for binary classification is unclear. Maybe to simulate coef_ in LogisticRegression? |
We can just remove coef_, intercept_ for BaseDiscreteNB and forget about treating "naive Bayes as a linear classifier". Or we can fix coef_, intercept_ for MultinomialNB and write correct coef_, intercept_ for BernoulliNB. This is helpful sometimes |
Ok, so you're talking about this: #2237 |
the problem with this is that changing behavior like this might break people's code... |
duplicated with #2237 |
MultinomialNB inherits coef_, intercept_ from BaseDiscreteNB defined by
For binary classification, in order to be consistent with LinearClassifierMixin, the coef_(intercept_) should be the difference between second and first rows(elements)
The text was updated successfully, but these errors were encountered: