Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] GaussianNB(): new parameter var_smoothing #9681

Merged
merged 14 commits into from
Sep 18, 2017

Conversation

Mottl
Copy link
Contributor

@Mottl Mottl commented Sep 3, 2017

This pull-request changes epsilon to 1/20 of max of variance to improve prediction strength.
MNIST handwritten digits recognition test:
score before: 0.563
score after: 0.798

@Mottl Mottl changed the title Changed epsilon to improve prediction strength of Naive Bayes classification [MRG] Changed epsilon to improve prediction strength of Naive Bayes classification Sep 3, 2017
@Mottl Mottl changed the title [MRG] Changed epsilon to improve prediction strength of Naive Bayes classification Changed epsilon to improve prediction strength of Naive Bayes classification Sep 3, 2017
@jnothman
Copy link
Member

jnothman commented Sep 3, 2017

On what basis can we assume/know this will not degrade performance elsewhere?

@jnothman
Copy link
Member

jnothman commented Sep 3, 2017

It's obviously changing other results in our tests

@Mottl Mottl changed the title Changed epsilon to improve prediction strength of Naive Bayes classification [WIP] Changed epsilon to improve prediction strength of Naive Bayes classification Sep 4, 2017
@agramfort
Copy link
Member

I agree with @jnothman

if you want to do something like this I would add parameter like min_std so you can adjust it.

@Mottl
Copy link
Contributor Author

Mottl commented Sep 5, 2017

I've added min_variance parameter to GaussianNB(), which is by default calculated as 1e-9 multiplied by the maximum variance across all dimensions. It behaves much like adding an epsilon to a variance as in the current code.

@Mottl Mottl changed the title [WIP] Changed epsilon to improve prediction strength of Naive Bayes classification [MRG] GaussianNB(): new parameter min_variance Sep 5, 2017
@amueller
Copy link
Member

amueller commented Sep 6, 2017

I was thinking about this a couple of weeks ago and couldn't find any references on it. I think it's a bit unclear on whether you want to have a minimum or an additive constant. I think a Bayesian prior would be an additive constant, right? I guess in practice it doesn't make a lot of difference. Having a reference would be good, though.

@amueller
Copy link
Member

amueller commented Sep 6, 2017

Actually a Bayesian prior would pull it towards a specific value, not be an additive constant.... That might be the most natural thing to implement imho?

@Mottl
Copy link
Contributor Author

Mottl commented Sep 7, 2017

Adding ε (or setting a minimum variance) is necessary for the single purpose — make the calculation of Gaussian PDF computable as we have σ in the denominator:
f1

It is not obvious for me why we should add ε to variances of all features instead of those that produce division by zero error.

So it is seems to me it is slightly better to use a minimum variance than to add ε to all variances, but the difference is negligible.

@@ -354,11 +363,13 @@ def _partial_fit(self, X, y, classes=None, _refit=False,
n_classes = len(self.classes_)
self.theta_ = np.zeros((n_classes, n_features))
self.sigma_ = np.zeros((n_classes, n_features))
# create a 2d-array of uncorrected variances for further use
# in _update_mean_variance()
self.sigma_uncorrected_ = np.zeros((n_classes, n_features))
Copy link
Member

@agramfort agramfort Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not using the same trick as before (cf below)

# put epsilon back each time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we don't use a constant epsilon anymore.

Copy link
Contributor Author

@Mottl Mottl Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we can't subtract min_variance from self.sigma_ (that way we'd always get zero calculating previous sigma).

@Mottl
Copy link
Contributor Author

Mottl commented Sep 7, 2017

So far I can't find any difference between adding epsilon and setting minimum variance when using Naive Bayes. But I found that adding epsilon is more stable when used in Non-naive Bayes classification:

Non-naive Bayes classification
==============================
MNIST handwritten digits
Train dataset: (1000, 784), test dataset: (1000, 784)
Score (epsilon=1e-09)       = 0.832
Score (min_variance=1e-09)  = 0.832
Score (epsilon=1e-08)       = 0.919
Score (min_variance=1e-08)  = 0.48
Score (epsilon=1e-07)       = 0.919
Score (min_variance=1e-07)  = 0.479
Score (epsilon=1e-06)       = 0.919
Score (min_variance=1e-06)  = 0.487
Score (epsilon=1e-05)       = 0.919
Score (min_variance=1e-05)  = 0.496
Score (epsilon=0.0001)      = 0.919
Score (min_variance=0.0001) = 0.588
Score (epsilon=0.001)       = 0.921
Score (min_variance=0.001)  = 0.668
Score (epsilon=0.01)        = 0.933
Score (min_variance=0.01)   = 0.729
Score (epsilon=0.1)         = 0.936
Score (min_variance=0.1)    = 0.488
Score (epsilon=1)           = 0.865
Score (min_variance=1)      = 0.928
Score (epsilon=10)          = 0.709
Score (min_variance=10)     = 0.87
Score (epsilon=100)         = 0.351
Score (min_variance=100)    = 0.204

@Mottl Mottl changed the title [MRG] GaussianNB(): new parameter min_variance [WIP] GaussianNB(): new parameter min_variance Sep 8, 2017
@Mottl Mottl changed the title [WIP] GaussianNB(): new parameter min_variance [WIP] GaussianNB(): new parameter epsilon Sep 8, 2017
@Mottl Mottl changed the title [WIP] GaussianNB(): new parameter epsilon [MRG] GaussianNB(): new parameter epsilon Sep 8, 2017
@amueller
Copy link
Member

amueller commented Sep 8, 2017

what do you mean by non-naive bayes?

@amueller
Copy link
Member

amueller commented Sep 8, 2017

btw, this dataset is very non-gaussian and therefore a very odd example. I think I would usually assume that people use the StandardScaler before GaussianNB and that the data is at least somewhat gaussian.

@Mottl
Copy link
Contributor Author

Mottl commented Sep 8, 2017

Non-naive bayes is that one uses full covariance matrix instead of just a variance vector in a multivariate gaussian pdf (see scipy.stats.multivariate_normal.logpdf() ).
I'm planning to add non-naive bayes class in a couple of days in another PR. That's why sharing the same parameter epsilon becomes somewhat better.

@Mottl
Copy link
Contributor Author

Mottl commented Sep 8, 2017

What dataset do you suggest for testing Naive Bayes?

@agramfort
Copy link
Member

non-naive Bayes full covariance is quadratic discriminant analysis (aka QDA) and is already in sklearn.

@agramfort
Copy link
Member

epsilon is a too generic name. Please try to improve variable name and docstring. I not opposed to exposition this internal epsilon parameter.

@Mottl
Copy link
Contributor Author

Mottl commented Sep 12, 2017

Is smoothing looks better than epsilon?

@agramfort
Copy link
Member

agramfort commented Sep 13, 2017 via email

@Mottl
Copy link
Contributor Author

Mottl commented Sep 13, 2017

It is not a scaling parameter, but an additive one. And since 3de55d0 it is not a minimum of variance but an additive value to all variances.
var_smoothing?

@agramfort
Copy link
Member

agramfort commented Sep 14, 2017 via email

@Mottl
Copy link
Contributor Author

Mottl commented Sep 15, 2017

@agramfort
Done.

@agramfort agramfort changed the title [MRG] GaussianNB(): new parameter epsilon [MRG+1] GaussianNB(): new parameter epsilon Sep 17, 2017
@agramfort
Copy link
Member

+1 for MRG

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Please add an entry to doc/whats_new, and we'll merge. Thanks!

@jnothman jnothman changed the title [MRG+1] GaussianNB(): new parameter epsilon [MRG+2] GaussianNB(): new parameter epsilon Sep 18, 2017
@lesteve
Copy link
Member

lesteve commented Sep 18, 2017

I added the PR link in whats_new this way it is easier to find out details about the change.

@Mottl Mottl changed the title [MRG+2] GaussianNB(): new parameter epsilon [MRG+2] GaussianNB(): new parameter var_smoothing Sep 18, 2017
@lesteve
Copy link
Member

lesteve commented Sep 18, 2017

This one should be merged when the CIs are green.

@Mottl
Copy link
Contributor Author

Mottl commented Sep 18, 2017

Thank you all

@lesteve
Copy link
Member

lesteve commented Sep 18, 2017

Merging, thanks a lot @Mottl!

@lesteve lesteve merged commit 08b524d into scikit-learn:master Sep 18, 2017
@Mottl Mottl deleted the native_bayes_epsilon branch September 18, 2017 12:45
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
@hongwen000
Copy link

I hope the purpose of adding this parameter will be documented. I am taking a ML course and working out a classification problem. My teacher asked me to explain how I tuned whatever selected models in the report, including the meaning of their parameters. I stuck there unable to find out what this var_smoothing really mean until I find this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants