# Added Multinomial Naive Bayes classifier. #107

Closed
wants to merge 11 commits into
from
+844 −236

## Conversation

Projects
None yet
7 participants
Contributor

### amitibo commented Mar 21, 2011

 Added Multinomial Naive Bayes classifier. Updated the documents and tests. Amit

### amitibo added some commits Mar 21, 2011

 Added Multinomial Naive Bayes classifier 
 833695a 
 Fix to the documentation of the Multinomial Naive Bayes. 
 14a66c0 
Owner

### mblondel commented Mar 22, 2011

 Thank you very much for your contribution. A few remarks: Style: you don't need to add an extra space after parentheses you don't need to add an extra sharp sign in your comments Functionality: predict should be implemented without predict_log_proba: the argmax is independent of the denominator support for sparse matrices would be nice (then add it to the 20 newsgroup example and optimize performance) Tests: you should test that np.log(clf.predict_proba(X)) gives the same results as clf.predict_log_proba(X)
Contributor

### amitibo commented Mar 22, 2011

 Thanks for the tips. Is the style issue strict? This is how I write my code. Amit On 22/03/2011 07:50, mblondel wrote: Thank very much for your contribution. A few remarks: Style: you don't need to add an extra space after parentheses you don't need to add an extra sharp sign in your comments Functionality: predict should be implemented without predict_log_proba: the argmax is independent of the denominator support for sparse matrices would be nice (then add it to the 20 newsgroup example and optimize performance) Tests: you should test that np.log(clf.predict_proba(X)) gives the same results as clf.predict_log_proba(X)
Owner

### ogrisel commented Mar 22, 2011

 Thanks for the contrib. Yes the style is strict :) Please use the pep8 utility to spot style issues: http://pypi.python.org/pypi/pep8 As @mblondel said please also write a test case that demonstrates the support for scipy.sparse inputs and include the MNB model in the document classification example: http://scikit-learn.sourceforge.net/auto_examples/document_classification_20newsgroups.html Also please always use lowercase variable names (unless the special case for 2D numpy arrays or scipy matrices). We will also require a new section in the documentation about this model and referencing the examples where it is used. If I am not mistaken the user guide currently lacks a section on Bayesian classifiers: http://scikit-learn.sourceforge.net/user_guide.html Such a section should be added in the doc/ folder in the source tree and referenced from the table of content. Take example on the SGD or SVM documentation as reference for instance.
Owner

### mblondel commented Mar 22, 2011

 A few people in the project really like pep8-style :). Having everyone using the same style conventions makes reading the code just more comfortable.

### ogrisel reviewed Mar 22, 2011

#### ogrisel Mar 22, 2011

Owner

See also lacks a reference on the GNB class explaining the difference between both in practice.

### amitibo added some commits Mar 28, 2011

 Pep 8 compliance and cleanup for the multinomial naive bayes 
 ceaf255 
 Merge remote branch 'upstream/master' 
 59df234 
Owner

### fabianp commented Mar 30, 2011

 Thanks for making the PEP8 changes. One thing that should be improved is the docstring for the public methods, it should include at least a description of the input and output values. Note that GNB was previously lacky on this, I just fixed it in 4c1fb9f Other things: Estimated parameter theta_c* should be documented in the docstring. I'm not sure about names such as alpha_i and theta_i, I usually use such names to represent items from an array, but that is not the case here. What do you think ? Also, if these are the standard names, please provide references. Other things I noted but can wait until after the merge are in issue #108

### amitibo added some commits Mar 30, 2011

 Some more pep8 
 7ae1566 
 Merge branch 'master' of git://github.com/scikit-learn/scikit-learn 
Conflicts:
scikits/learn/naive_bayes.py
 ade11c3 
Contributor

### amitibo commented Mar 31, 2011

 Thank you for the comments. I am bit slow on the commits due to work load, but I will take of it. Amit On 30/03/2011 12:58, fabianp wrote: Thanks for making the PEP8 changes. One thing that should be improved is the docstring for the public methods, it should include at least a description of the input and output values. Note that GNB was previously lacky on this, I just fixed it in 4c1fb9f Other things: - Estimated parameter theta_c* should be documented in the docstring. - I'm not sure about names such as alpha_i and theta_i, I usually use such names to represent items from an array, but that is not the case here. What do you think ? Also, if these are the standard names, please provide references.  Other things I noted but can wait until after the merge are in issue #108

### unknown added some commits Apr 11, 2011

 Added documentation for the Naive Bayes classifiers. 
 9e60a3f 
 Added sparse MNNB and modified the textual examples to benchmark it. 
 5c6a453 
 Modified the Naive Bayes nose tests to the new location of the module… 
… and added sparse test.
 cdd0ef7 

### GaelVaroquaux commented on 5c6a453Apr 12, 2011

 "MNNB" is name a very hard to understand. I would like less acronymes in the scikit (GNB is a good example of bad name). MultiNB or MultinomialNB would be a better name, I believe.
Owner

### mblondel commented Apr 13, 2011

 I completely forgot that I had started some work on a multinomial naive bayes branch. It supports sparse matrices, semi-supervised learning and complement naive bayes. It needs to be polished and it is still unclear how to handle the semi-supervised case in the API (I used fit_semi for now). I don't have time to work on that unfortunately, but feel free to start from my branch, copy snippets of code or whatever is useful to you.

### amitibo added some commits May 11, 2011

 Merge remote branch 'upstream/master' 
Conflicts:
scikits/learn/naive_bayes.py
 595bf83 
 naive bayes name change MNNB->MultinomialNB 
 3dd1d00 
Owner

### larsmans commented May 19, 2011

 I did some cleanup of @mblondel's code, it's in my branch naive-bayes.
Owner

### mblondel commented May 19, 2011

 @larsmans Thanks a lot for doing in this. We really need a robust multinomial naive bayes in the scikit. to_1_of_K should be replaced by LabelBinarizer. Also since we didn't agreed on a semi supervised API, we might want to throw this part away and keep it for another branch. If we do keep it, we should probably have a discussion on the ML to decide the API. What do you think?
Owner

### larsmans commented May 19, 2011

 @mblondel: done what you suggested, removed semi-sup and ComplementNB as well. Please pull and inspect.

### agramfort reviewed May 20, 2011

 @@ -7,7 +7,28 @@ Naive Bayes **Naive Bayes** algorithms are a set of supervised learning methods based on applying Baye's theorem with strong (naive) independence

Owner

s/Baye's/Bayes'

### agramfort reviewed May 20, 2011

 @@ -7,7 +7,28 @@ Naive Bayes **Naive Bayes** algorithms are a set of supervised learning methods based on applying Baye's theorem with strong (naive) independence -assumptions. +assumptions. Given a class variable :math:c and a dependent set +of feature variables :math:f_1 through :math:f_n, the bayes

Owner

s/bayes/Bayes

### agramfort reviewed May 20, 2011

 +multinomial. The distribution is parametrized by the vector +:math:\overline{\theta_c} = (\theta_{c1},\ldots,\theta_{cn}) where :math:c +is the class of document, :math:n is the size of the vocabulary and :math:\theta_{ci} +is the prbability of word :math:i appearing in a document of class :math:c.

#### agramfort May 20, 2011

Owner

s/prbability/probability

### agramfort reviewed May 20, 2011

 + """ + Multinomial Naive Bayes (MultinomialNB) + + The Multinomial Naive Bayes classifier is suitable for text classification.

#### agramfort May 20, 2011

Owner

is it really only suitable for text?

### agramfort reviewed May 20, 2011

 + Examples + -------- + >>> import numpy as np + >>> X = np.random.randint( 5, size=(6, 100) )

#### agramfort May 20, 2011

Owner

no space after "(" and before ")"

Owner

### larsmans commented May 22, 2011

 I discussed with @amitibo and have adopted this branch. See new pull request; closing this one.