New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] GaussianNB(): new parameter `var_smoothing` #9681

Merged
merged 14 commits into from Sep 18, 2017

Conversation

Projects
None yet
5 participants
@Mottl
Contributor

Mottl commented Sep 3, 2017

This pull-request changes epsilon to 1/20 of max of variance to improve prediction strength.
MNIST handwritten digits recognition test:
score before: 0.563
score after: 0.798

@Mottl Mottl changed the title from Changed epsilon to improve prediction strength of Naive Bayes classification to [MRG] Changed epsilon to improve prediction strength of Naive Bayes classification Sep 3, 2017

@Mottl Mottl changed the title from [MRG] Changed epsilon to improve prediction strength of Naive Bayes classification to Changed epsilon to improve prediction strength of Naive Bayes classification Sep 3, 2017

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 3, 2017

Member

On what basis can we assume/know this will not degrade performance elsewhere?

Member

jnothman commented Sep 3, 2017

On what basis can we assume/know this will not degrade performance elsewhere?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 3, 2017

Member

It's obviously changing other results in our tests

Member

jnothman commented Sep 3, 2017

It's obviously changing other results in our tests

@Mottl Mottl changed the title from Changed epsilon to improve prediction strength of Naive Bayes classification to [WIP] Changed epsilon to improve prediction strength of Naive Bayes classification Sep 4, 2017

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Sep 4, 2017

Member

I agree with @jnothman

if you want to do something like this I would add parameter like min_std so you can adjust it.

Member

agramfort commented Sep 4, 2017

I agree with @jnothman

if you want to do something like this I would add parameter like min_std so you can adjust it.

Mottl added some commits Sep 5, 2017

@Mottl

This comment has been minimized.

Show comment
Hide comment
@Mottl

Mottl Sep 5, 2017

Contributor

I've added min_variance parameter to GaussianNB(), which is by default calculated as 1e-9 multiplied by the maximum variance across all dimensions. It behaves much like adding an epsilon to a variance as in the current code.

Contributor

Mottl commented Sep 5, 2017

I've added min_variance parameter to GaussianNB(), which is by default calculated as 1e-9 multiplied by the maximum variance across all dimensions. It behaves much like adding an epsilon to a variance as in the current code.

@Mottl Mottl changed the title from [WIP] Changed epsilon to improve prediction strength of Naive Bayes classification to [MRG] GaussianNB(): new parameter `min_variance` Sep 5, 2017

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 6, 2017

Member

I was thinking about this a couple of weeks ago and couldn't find any references on it. I think it's a bit unclear on whether you want to have a minimum or an additive constant. I think a Bayesian prior would be an additive constant, right? I guess in practice it doesn't make a lot of difference. Having a reference would be good, though.

Member

amueller commented Sep 6, 2017

I was thinking about this a couple of weeks ago and couldn't find any references on it. I think it's a bit unclear on whether you want to have a minimum or an additive constant. I think a Bayesian prior would be an additive constant, right? I guess in practice it doesn't make a lot of difference. Having a reference would be good, though.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 6, 2017

Member

Actually a Bayesian prior would pull it towards a specific value, not be an additive constant.... That might be the most natural thing to implement imho?

Member

amueller commented Sep 6, 2017

Actually a Bayesian prior would pull it towards a specific value, not be an additive constant.... That might be the most natural thing to implement imho?

@Mottl

This comment has been minimized.

Show comment
Hide comment
@Mottl

Mottl Sep 7, 2017

Contributor

Adding ε (or setting a minimum variance) is necessary for the single purpose — make the calculation of Gaussian PDF computable as we have σ in the denominator:
f1

It is not obvious for me why we should add ε to variances of all features instead of those that produce division by zero error.

So it is seems to me it is slightly better to use a minimum variance than to add ε to all variances, but the difference is negligible.

Contributor

Mottl commented Sep 7, 2017

Adding ε (or setting a minimum variance) is necessary for the single purpose — make the calculation of Gaussian PDF computable as we have σ in the denominator:
f1

It is not obvious for me why we should add ε to variances of all features instead of those that produce division by zero error.

So it is seems to me it is slightly better to use a minimum variance than to add ε to all variances, but the difference is negligible.

@Mottl

This comment has been minimized.

Show comment
Hide comment
@Mottl

Mottl Sep 7, 2017

Contributor

So far I can't find any difference between adding epsilon and setting minimum variance when using Naive Bayes. But I found that adding epsilon is more stable when used in Non-naive Bayes classification:

Non-naive Bayes classification
==============================
MNIST handwritten digits
Train dataset: (1000, 784), test dataset: (1000, 784)
Score (epsilon=1e-09)       = 0.832
Score (min_variance=1e-09)  = 0.832
Score (epsilon=1e-08)       = 0.919
Score (min_variance=1e-08)  = 0.48
Score (epsilon=1e-07)       = 0.919
Score (min_variance=1e-07)  = 0.479
Score (epsilon=1e-06)       = 0.919
Score (min_variance=1e-06)  = 0.487
Score (epsilon=1e-05)       = 0.919
Score (min_variance=1e-05)  = 0.496
Score (epsilon=0.0001)      = 0.919
Score (min_variance=0.0001) = 0.588
Score (epsilon=0.001)       = 0.921
Score (min_variance=0.001)  = 0.668
Score (epsilon=0.01)        = 0.933
Score (min_variance=0.01)   = 0.729
Score (epsilon=0.1)         = 0.936
Score (min_variance=0.1)    = 0.488
Score (epsilon=1)           = 0.865
Score (min_variance=1)      = 0.928
Score (epsilon=10)          = 0.709
Score (min_variance=10)     = 0.87
Score (epsilon=100)         = 0.351
Score (min_variance=100)    = 0.204
Contributor

Mottl commented Sep 7, 2017

So far I can't find any difference between adding epsilon and setting minimum variance when using Naive Bayes. But I found that adding epsilon is more stable when used in Non-naive Bayes classification:

Non-naive Bayes classification
==============================
MNIST handwritten digits
Train dataset: (1000, 784), test dataset: (1000, 784)
Score (epsilon=1e-09)       = 0.832
Score (min_variance=1e-09)  = 0.832
Score (epsilon=1e-08)       = 0.919
Score (min_variance=1e-08)  = 0.48
Score (epsilon=1e-07)       = 0.919
Score (min_variance=1e-07)  = 0.479
Score (epsilon=1e-06)       = 0.919
Score (min_variance=1e-06)  = 0.487
Score (epsilon=1e-05)       = 0.919
Score (min_variance=1e-05)  = 0.496
Score (epsilon=0.0001)      = 0.919
Score (min_variance=0.0001) = 0.588
Score (epsilon=0.001)       = 0.921
Score (min_variance=0.001)  = 0.668
Score (epsilon=0.01)        = 0.933
Score (min_variance=0.01)   = 0.729
Score (epsilon=0.1)         = 0.936
Score (min_variance=0.1)    = 0.488
Score (epsilon=1)           = 0.865
Score (min_variance=1)      = 0.928
Score (epsilon=10)          = 0.709
Score (min_variance=10)     = 0.87
Score (epsilon=100)         = 0.351
Score (min_variance=100)    = 0.204

@Mottl Mottl changed the title from [MRG] GaussianNB(): new parameter `min_variance` to [WIP] GaussianNB(): new parameter `min_variance` Sep 8, 2017

@Mottl Mottl changed the title from [WIP] GaussianNB(): new parameter `min_variance` to [WIP] GaussianNB(): new parameter `epsilon` Sep 8, 2017

@Mottl Mottl changed the title from [WIP] GaussianNB(): new parameter `epsilon` to [MRG] GaussianNB(): new parameter `epsilon` Sep 8, 2017

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 8, 2017

Member

what do you mean by non-naive bayes?

Member

amueller commented Sep 8, 2017

what do you mean by non-naive bayes?

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 8, 2017

Member

btw, this dataset is very non-gaussian and therefore a very odd example. I think I would usually assume that people use the StandardScaler before GaussianNB and that the data is at least somewhat gaussian.

Member

amueller commented Sep 8, 2017

btw, this dataset is very non-gaussian and therefore a very odd example. I think I would usually assume that people use the StandardScaler before GaussianNB and that the data is at least somewhat gaussian.

@Mottl

This comment has been minimized.

Show comment
Hide comment
@Mottl

Mottl Sep 8, 2017

Contributor

Non-naive bayes is that one uses full covariance matrix instead of just a variance vector in a multivariate gaussian pdf (see scipy.stats.multivariate_normal.logpdf() ).
I'm planning to add non-naive bayes class in a couple of days in another PR. That's why sharing the same parameter epsilon becomes somewhat better.

Contributor

Mottl commented Sep 8, 2017

Non-naive bayes is that one uses full covariance matrix instead of just a variance vector in a multivariate gaussian pdf (see scipy.stats.multivariate_normal.logpdf() ).
I'm planning to add non-naive bayes class in a couple of days in another PR. That's why sharing the same parameter epsilon becomes somewhat better.

@Mottl

This comment has been minimized.

Show comment
Hide comment
@Mottl

Mottl Sep 8, 2017

Contributor

What dataset do you suggest for testing Naive Bayes?

Contributor

Mottl commented Sep 8, 2017

What dataset do you suggest for testing Naive Bayes?

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Sep 12, 2017

Member

non-naive Bayes full covariance is quadratic discriminant analysis (aka QDA) and is already in sklearn.

Member

agramfort commented Sep 12, 2017

non-naive Bayes full covariance is quadratic discriminant analysis (aka QDA) and is already in sklearn.

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Sep 12, 2017

Member

epsilon is a too generic name. Please try to improve variable name and docstring. I not opposed to exposition this internal epsilon parameter.

Member

agramfort commented Sep 12, 2017

epsilon is a too generic name. Please try to improve variable name and docstring. I not opposed to exposition this internal epsilon parameter.

@Mottl

This comment has been minimized.

Show comment
Hide comment
@Mottl

Mottl Sep 12, 2017

Contributor

Is smoothing looks better than epsilon?

Contributor

Mottl commented Sep 12, 2017

Is smoothing looks better than epsilon?

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Sep 13, 2017

Member
Member

agramfort commented Sep 13, 2017

@Mottl

This comment has been minimized.

Show comment
Hide comment
@Mottl

Mottl Sep 13, 2017

Contributor

It is not a scaling parameter, but an additive one. And since 3de55d0 it is not a minimum of variance but an additive value to all variances.
var_smoothing?

Contributor

Mottl commented Sep 13, 2017

It is not a scaling parameter, but an additive one. And since 3de55d0 it is not a minimum of variance but an additive value to all variances.
var_smoothing?

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Sep 14, 2017

Member
Member

agramfort commented Sep 14, 2017

Mottl added some commits Sep 15, 2017

@Mottl

This comment has been minimized.

Show comment
Hide comment
@Mottl

Mottl Sep 15, 2017

Contributor

@agramfort
Done.

Contributor

Mottl commented Sep 15, 2017

@agramfort
Done.

@agramfort agramfort changed the title from [MRG] GaussianNB(): new parameter `epsilon` to [MRG+1] GaussianNB(): new parameter `epsilon` Sep 17, 2017

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Sep 17, 2017

Member

+1 for MRG

Member

agramfort commented Sep 17, 2017

+1 for MRG

@jnothman

Looks great! Please add an entry to doc/whats_new, and we'll merge. Thanks!

@jnothman jnothman changed the title from [MRG+1] GaussianNB(): new parameter `epsilon` to [MRG+2] GaussianNB(): new parameter `epsilon` Sep 18, 2017

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Sep 18, 2017

Member

I added the PR link in whats_new this way it is easier to find out details about the change.

Member

lesteve commented Sep 18, 2017

I added the PR link in whats_new this way it is easier to find out details about the change.

@Mottl Mottl changed the title from [MRG+2] GaussianNB(): new parameter `epsilon` to [MRG+2] GaussianNB(): new parameter `var_smoothing` Sep 18, 2017

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Sep 18, 2017

Member

This one should be merged when the CIs are green.

Member

lesteve commented Sep 18, 2017

This one should be merged when the CIs are green.

@Mottl

This comment has been minimized.

Show comment
Hide comment
@Mottl

Mottl Sep 18, 2017

Contributor

Thank you all

Contributor

Mottl commented Sep 18, 2017

Thank you all

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Sep 18, 2017

Member

Merging, thanks a lot @Mottl!

Member

lesteve commented Sep 18, 2017

Merging, thanks a lot @Mottl!

@lesteve lesteve merged commit 08b524d into scikit-learn:master Sep 18, 2017

6 checks passed

ci/circleci Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 96.17%)
Details
codecov/project 96.17% (+<.01%) compared to 2dd1574
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
lgtm analysis: Python No alert changes
Details

@Mottl Mottl deleted the Mottl:native_bayes_epsilon branch Sep 18, 2017

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment