Vector pseudo-counts in MultinomialNB broken from 0.18.1 to 0.19.1

Although not officially documented in scikit-learn v. 0.18.1 pseudo-counts (alpha) could be vectors (np.array)'s with the same length as the number of features. This is a quite meaningful behaviour as the feature probability estimates can be interpreted as MAP estimates in a Dirichlet-multinomial model, where the pseudo-counts corresponds to parameters in a Dirichlet prior distribution for the probabilities.

```python
from sklearn.naive_bayes import MultinomialNB
import numpy as np

mnb = MultinomialNB(alpha = np.array([1.,3.,2.]), fit_prior = False)
mnb.partial_fit(X = np.array([[1.,1.,1.]]), y = np.array(['a']), classes = ['a','b'])
mnb.feature_log_prob_
```

Gives the following output in scikit-learn v. 0.18.1:
```
array([[-1.5040774 , -0.81093022, -1.09861229],
       [-1.79175947, -0.69314718, -1.09861229]])
```

However in v. 0.19.1 the code fails at partial_fit
```
    465
    466     def _check_alpha(self):
--> 467         if self.alpha < 0:
    468             raise ValueError('Smoothing parameter alpha = %.1e. '
    469                              'alpha should be > 0.' % self.alpha)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
```

The behaviour was introduced in this commit (b4b5de8cf9748a07d8f3a2d1fc89ccaacdf6576f) to check for negative alpha values.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Vector pseudo-counts in MultinomialNB broken from 0.18.1 to 0.19.1 #10346

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Vector pseudo-counts in MultinomialNB broken from 0.18.1 to 0.19.1 #10346

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions