Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multinomial Bayes issue #5814

Closed
mladenk42 opened this issue Nov 13, 2015 · 8 comments
Closed

Multinomial Bayes issue #5814

mladenk42 opened this issue Nov 13, 2015 · 8 comments
Labels
Bug
Milestone

Comments

@mladenk42
Copy link

@mladenk42 mladenk42 commented Nov 13, 2015

Given the digits dataset (available in sklearn.datasets), we split it into train and test set.
We fit a MultinomialNB classifier on the train set, and generate predictions on that same train
set. When this is done without smoothing, the classification performance is rather low.
This is contrary to expectations, as the classifier has already seen all possible feature values.
So not including smoothing shouldn't really make such a big difference.

It seems the BernoulliNB class from sklearn also has this problem.

My inefficient but straightforward implementation performs expectedly well on the training set.
Code to reproduce the issue with some more tests is available at http://pastebin.com/2hsrA8xL
I hope the issue is not caused by some trivial implementation detail I've overlooked.

@agramfort
Copy link
Member

@agramfort agramfort commented Nov 14, 2015

@mladenk42
Copy link
Author

@mladenk42 mladenk42 commented Nov 14, 2015

Might be a silly question but where do i send it :)? The scikit-learn-issues mailing list?
Thanks.

@agramfort
Copy link
Member

@agramfort agramfort commented Nov 14, 2015

@mladenk42
Copy link
Author

@mladenk42 mladenk42 commented Nov 15, 2015

Since it's my first time submitting an error report I'm completely lost :(

It says here that PR-s are used to tell others about changes I pushed to a repository. I didn't push anything O_o, nor do I want to. The demo script I submitted (via pastebin link) doesn't fix the issue, or change anything in sklearn code. It's just there to reproduce the issue, so someone who knows the sklearn code can debug it more easily. Perhaps I'm misunderstanding something.

@absolutelyNoWarranty
Copy link
Contributor

@absolutelyNoWarranty absolutelyNoWarranty commented Nov 19, 2015

It seems when alpha=0.0 and the data has features which never change, feature_log_prob_ has -inf which causes the calculations down the line to become all nan.

Given
X = np.array([[1, 0],[1, 1]])
y = np.array([0, 1])

Compare

nb = BernoulliNB(alpha=0.)
nb.fit(X, y)

nb.predict_proba(X)
Out[135]:
array([[ nan,  nan],
       [ nan,  nan]])

nb.feature_log_prob_
Out[139]:
array([[  0., -inf],
       [  0.,   0.]])

and

nb = BernoulliNB(alpha=1e-15)
nb.fit(X, y)
nb.predict_proba(X)

Out[136]:
array([[  1.0000e+00,   8.8818e-16],
       [  1.0000e-15,   1.0000e+00]])
@amueller amueller added the Bug label Sep 14, 2016
@amueller
Copy link
Member

@amueller amueller commented Sep 14, 2016

We should probably throw an error when that happens if that creates issues.

@amueller amueller added this to the 0.19 milestone Sep 14, 2016
yl565 added a commit to yl565/scikit-learn that referenced this issue Sep 23, 2016
@yl565
Copy link
Contributor

@yl565 yl565 commented Sep 24, 2016

@amueller Please see my PR to this issue.

yl565 added a commit to yl565/scikit-learn that referenced this issue Oct 17, 2016
herilalaina added a commit to herilalaina/scikit-learn that referenced this issue Jun 15, 2017
@jmschrei
Copy link
Member

@jmschrei jmschrei commented Jun 19, 2017

Fixed via #9131

jmschrei added a commit that referenced this issue Jun 19, 2017
…9131)

* Fix #5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
dmohns added a commit to dmohns/scikit-learn that referenced this issue Aug 7, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
dmohns added a commit to dmohns/scikit-learn that referenced this issue Aug 7, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
NelleV added a commit to NelleV/scikit-learn that referenced this issue Aug 11, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
paulha added a commit to paulha/scikit-learn that referenced this issue Aug 19, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
AishwaryaRK added a commit to AishwaryaRK/scikit-learn that referenced this issue Aug 29, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
maskani-moh added a commit to maskani-moh/scikit-learn that referenced this issue Nov 15, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this issue Dec 18, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
scikit-learn 0.19
Issues Without PR
Linked pull requests

Successfully merging a pull request may close this issue.

6 participants
You can’t perform that action at this time.