-
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Description
Although not officially documented in scikit-learn v. 0.18.1 pseudo-counts (alpha) could be vectors (np.array)'s with the same length as the number of features. This is a quite meaningful behaviour as the feature probability estimates can be interpreted as MAP estimates in a Dirichlet-multinomial model, where the pseudo-counts corresponds to parameters in a Dirichlet prior distribution for the probabilities.
from sklearn.naive_bayes import MultinomialNB
import numpy as np
mnb = MultinomialNB(alpha = np.array([1.,3.,2.]), fit_prior = False)
mnb.partial_fit(X = np.array([[1.,1.,1.]]), y = np.array(['a']), classes = ['a','b'])
mnb.feature_log_prob_
Gives the following output in scikit-learn v. 0.18.1:
array([[-1.5040774 , -0.81093022, -1.09861229],
[-1.79175947, -0.69314718, -1.09861229]])
However in v. 0.19.1 the code fails at partial_fit
465
466 def _check_alpha(self):
--> 467 if self.alpha < 0:
468 raise ValueError('Smoothing parameter alpha = %.1e. '
469 'alpha should be > 0.' % self.alpha)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
The behaviour was introduced in this commit (b4b5de8) to check for negative alpha values.
Metadata
Metadata
Assignees
Labels
No labels