Skip to content

Vector pseudo-counts in MultinomialNB broken from 0.18.1 to 0.19.1 #10346

@TobiasMadsen

Description

@TobiasMadsen

Although not officially documented in scikit-learn v. 0.18.1 pseudo-counts (alpha) could be vectors (np.array)'s with the same length as the number of features. This is a quite meaningful behaviour as the feature probability estimates can be interpreted as MAP estimates in a Dirichlet-multinomial model, where the pseudo-counts corresponds to parameters in a Dirichlet prior distribution for the probabilities.

from sklearn.naive_bayes import MultinomialNB
import numpy as np

mnb = MultinomialNB(alpha = np.array([1.,3.,2.]), fit_prior = False)
mnb.partial_fit(X = np.array([[1.,1.,1.]]), y = np.array(['a']), classes = ['a','b'])
mnb.feature_log_prob_

Gives the following output in scikit-learn v. 0.18.1:

array([[-1.5040774 , -0.81093022, -1.09861229],
       [-1.79175947, -0.69314718, -1.09861229]])

However in v. 0.19.1 the code fails at partial_fit

    465
    466     def _check_alpha(self):
--> 467         if self.alpha < 0:
    468             raise ValueError('Smoothing parameter alpha = %.1e. '
    469                              'alpha should be > 0.' % self.alpha)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

The behaviour was introduced in this commit (b4b5de8) to check for negative alpha values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions