# scikit-learn/scikit-learn

### Subversion checkout URL

You can clone with
or
.

# WIP: Semisupervised Naive Bayes using Expectation Maximization#430

Closed
wants to merge 20 commits into from
+446 −11

### 5 participants

Owner

Here's the EM algorithm for semisupervised Naive Bayes. The implementation checks for convergence based on the coefficients following the advice of Bishop 2006, so it could be used more generally for linear classifier self-training, but I only implemented the necessary machinery (fitting on a 1-of-K vector) on the discrete NB estimators and we might want to switch to log-likelihood based convergence checking later.

Narrative documentation follows if there's interest in this pull request. I've adapted the document classification example script into a new, semisupervised example. We might also merge these scripts.

Owner

Looks very interesting. For lazy people here is the outcome of the document classification example with 4 of the 20newsgroups with only 10% of samples in the training set having labels (around 200 labeled samples and 1800 without labels):

================================================================================
Baseline: fully supervised Naive Bayes
________________________________________________________________________________
Training:
MultinomialNB(alpha=0.01, fit_prior=True)
train time: 0.009s
test time:  0.003s
f1-score:   0.809
dimensionality: 32101

________________________________________________________________________________
Training:
BernoulliNB(alpha=0.01, binarize=0.0, fit_prior=True)
train time: 0.010s
test time:  0.022s
f1-score:   0.818
dimensionality: 32101

================================================================================
Naive Bayes trained with Expectation Maximization
________________________________________________________________________________
Training:
EMNB(estimator=MultinomialNB(alpha=0.01, fit_prior=True),
estimator__alpha=0.01, estimator__fit_prior=True, n_iter=10, tol=0.001,
verbose=False)
train time: 0.197s
test time:  0.003s
f1-score:   0.883
dimensionality: 32101

________________________________________________________________________________
Training:
EMNB(estimator=BernoulliNB(alpha=0.01, binarize=0.0, fit_prior=True),
estimator__alpha=0.01, estimator__binarize=0.0,
estimator__fit_prior=True, n_iter=10, tol=0.001, verbose=False)
train time: 0.416s
test time:  0.021s
f1-score:   0.864
dimensionality: 32101

Semi supervised learning is practically important in my opinion because of the cost of annotating data with supervised signal. It can help annotators bootstrap a processus by annotating a small proportion of a dataset. Then we can do some sort of active learning by asking the fitted model for the samples with the least confidence (according to predict_proba / decision function being close to the threshold) and ask the human annotator to label those examples first.

Do you have an idea with the CPU time of semi-supervised over supervised is always 10x or is there some asymptotic complexity that makes it not scalable to larger problems (i.e. 100k samples)?

This work is related to the LabelPropagation pull request (that needs a final review, i.e. profiling it and checking whether the eigen problem cannot be sped up). I would be curious to see if the label propagation branch could be able to handle sparse input so as to compare both methods on the 20newsgroups dataset.

sklearn/naive_bayes.py
 ((75 lines not shown)) + print "Naive Bayes EM, iteration %d," % i, + + clf._fit1ofK(X, Y, sample_weight, class_prior) + + d = (np.abs(old_coef - clf.coef_).sum() + + np.abs(old_intercept - clf.intercept_).sum()) + if self.verbose: + print "diff = %.3g" % d + if d < tol: + if self.verbose: + print "Naive Bayes EM converged" + break + + old_coef = np.copy(clf.coef_) + old_intercept = np.copy(clf.intercept_) + Y = clf.predict_proba(X)
Owner

@ogrisel: the time required should be simply n_iter times the time it takes to fit a single NB classifier + some overhead in the convergence checking.

sklearn/naive_bayes.py
 ((64 lines not shown)) + Y = clf._label_1ofK(y) + + unlabeled = np.where(y == -1)[0] + + n_features = X.shape[1] + n_classes = Y.shape[1] + tol = self.tol * n_features + + old_coef = np.zeros((n_classes, n_features)) + old_intercept = np.zeros(n_classes) + + for i in xrange(self.n_iter): + if self.verbose: + print "Naive Bayes EM, iteration %d," % i, + + clf._fit1ofK(X, Y, sample_weight, class_prior)
 Owner mblondel added a note Nov 8, 2011 If I understand the code correctly, in the very first call to this line (first iteration of the loop), Y should include -1 as labels. Does it affect the underlying classifier or does the classifier handle the -1 differently? Owner larsmans added a note Nov 8, 2011 I've hacked the LabelBinarizer to produce uniform probabilities where it encounters an unlabeled sample. I just changed the EMNB code to start from a supervised model, though, so this no longer happens. to join this conversation on GitHub. Already have an account? Sign in to comment
Owner
• I'm wondering if the view (sclicing?) X[y==-1] has an impact on performance. Before we merge this branch and the label propagation one, I would really like to clear that up. Taking the convention that unlabeled examples should always be at the end of the dataset may help.

• It seems to me that we may want to not tight this EM class to Naive Bayes: we should be able to apply the same kind of EM algorithm to any classifier with soft labeling using predict_proba if available and hard labeling with predict if not. We need to establish a way to know what are the model parameters though (for the convergence check).

• A plot accuracy vs percentage of labeled data would be nicer than a text-based output :)

Owner

I guess that if we check convergence based on the classifier output (training set accuracy), then this becomes a very general self-training algorithm.

Turns out the slicing is indeed incredibly expensive. I just cached it and training time decreases by a factor of five!

Owner

This is array masking (that indeed causes memory allocation) rather than slicing which is fast as it creates a cheap view without memory allocation.

Owner

@ogrisel: I might not be familiar enough with NumPy/SciPy's masking and slicing yet, but I just picked the option that works with sparse matrices, since text classification is the intended use case. We can add switching on type for smarter labeled/unlabeled selection later; currently, EMNB wants a BaseDiscreteNB anyway, since I don't have a use case for Gaussian NB. Please try the demo on the full 20newsgroups set and decide whether the performance is ok.

@mblondel: as regards extending this to other linear classifiers, I suggest we keep EMNB and maybe later introduce a new, generalized self-training meta-estimator for other classifiers. Naive Bayes EM has plenty of interesting routes for expansion, including optimizing for likelihood and the new SFE algorithm, so any duplication is not necessarily harmful.

I'm willing to write docs and fix bugs, but not to put very much more time into this PR for now.

Strange, the following is what I got. EM made the accuracy drop.

================================================================================
Baseline: fully supervised Naive Bayes
________________________________________________________________________________
Training:
MultinomialNB(alpha=0.01, fit_prior=True)
train time: 0.008s
test time:  0.004s
f1-score:   0.799
dimensionality: 32101

________________________________________________________________________________
Training:
BernoulliNB(alpha=0.01, binarize=0.0, fit_prior=True)
train time: 0.007s
test time:  0.021s
f1-score:   0.799
dimensionality: 32101

================================================================================
Naive Bayes trained with Expectation Maximization
________________________________________________________________________________
Training:
EMNB(estimator=MultinomialNB(alpha=0.01, fit_prior=True),
estimator__alpha=0.01, estimator__fit_prior=True, n_iter=10, tol=0.001,
verbose=False)
train time: 0.057s
test time:  0.004s
f1-score:   0.761
dimensionality: 32101

________________________________________________________________________________
Training:
EMNB(estimator=BernoulliNB(alpha=0.01, binarize=0.0, fit_prior=True),
estimator__alpha=0.01, estimator__binarize=0.0,
estimator__fit_prior=True, n_iter=10, tol=0.001, verbose=False)
train time: 0.101s
test time:  0.020s
f1-score:   0.790
dimensionality: 32101
Owner

@fannix: how much labeled and unlabeled data did you use?

@larsmans: It was just the demo examples.

Owner

Yes, this is the intended result. When no label is given, assume uniform probability. I'll look into the performance next week.

Owner

Owner

@fannix: I've reproduced the bad performance you saw. This seems to be due to commit b5bc737, i.e. only re-labeling the unlabeled samples.

Owner
Owner

@mblondel: I'm writing docs, care to review the code?

sklearn/naive_bayes.py
 ((17 lines not shown)) + Maximum number of iterations. + relabel_all : bool, optional + Whether to re-estimate class memberships for labeled samples as well. + Disabling this may result in bad performance, but follows Nigam et al. + closely. + tol : float, optional + Tolerance, per coefficient, for the convergence criterion. + Convergence is determined based on the coefficients (log probabilities) + instead of the model log likelihood. + verbose : boolean, optional + Whether to print progress information. + """ + + def __init__(self, estimator, n_iter=10, relabel_all=True, tol=1e-3, + verbose=False): + self.estimator = estimator
 Owner mblondel added a note Dec 19, 2011 shall we raise an error if estimator is not a BaseDiscreteNB? Owner larsmans added a note Dec 19, 2011 Good point. I want to extend it to handle BaseNB first. to join this conversation on GitHub. Already have an account? Sign in to comment
Owner

The estimator and the modifications you made to LabelBinarizer look good to me.

Regarding the example, it would be nice to modify the 20 newsgroup dataset loader so as to return ready-to-use features. This way, we wouldn't have to call Vectorizer (the purpose of the example is to illustrate semi-supervised learning, not feature extraction). If you feel like implementing it, a plot with the proportion of labeled data as x-axis and the accuracy as y-axis would be nice, but this PR can be merged without it IMO.

Owner

@mblondel I agree that this code gets duplicated in many examples.

I think we should have a load_vectorized_20newsgroups utility in this module that uses joblib memoizer to cache the results of the vectorization in the data_home folder.

Owner

Gone from WIP to MRG. I think I'm done for the moment.

Owner

Doctest failure

File "scikit-learn/sklearn/preprocessing/__init__.py", line 494, in sklearn.preprocessing.LabelBinarizerFailed example:
clf.transform([1, 6])
Expected:
array([[ 1.,  0.,  0.,  0.],
[ 0.,  0.,  0.,  1.]])
Got:
array([[ 0.25,  0.25,  0.25,  0.25],
[ 0.  ,  0.  ,  0.  ,  1.  ]])

>>  raise self.failureException(self.format_failure(.getvalue()))

Owner

doc/modules/naive_bayes.rst
 @@ -173,3 +173,48 @@ It is advisable to evaluate both models, if time permits. `_ 3rd Conf. on Email and Anti-Spam (CEAS). + +.. _semisupervised_naive_bayes: + +Semisupervised training with EM +------------------------------- + +The class ``SemisupervisedNB`` implements the expectation maximization (EM)
 Owner amueller added a note Dec 20, 2011 :class:SemisupervisedNB to join this conversation on GitHub. Already have an account? Sign in to comment
doc/modules/naive_bayes.rst
 @@ -173,3 +173,48 @@ It is advisable to evaluate both models, if time permits. `_ 3rd Conf. on Email and Anti-Spam (CEAS). + +.. _semisupervised_naive_bayes: + +Semisupervised training with EM +------------------------------- + +The class ``SemisupervisedNB`` implements the expectation maximization (EM) +algorithm for semisupervised training of Naive Bayes models, +where a part of the training samples are unlabeled. +Unlabeled data are indicated by a ``-1`` value in the label vector.
 Owner amueller added a note Dec 20, 2011 Why "-1" ? That makes using it with binary classification harder to use. I would prefer if unlabeled data had label "nan". fannix added a note Dec 20, 2011 That is the convention used in LabelBinarizer. I also think this convention is very confusing. Many dataset use +1/-1 as positive/negative encoding. … Owner larsmans added a note Dec 20, 2011 nan is hard to handle since nan != nan. We've been avoiding it everywhere so far, while we have been using -1 for outliers in both DBSCAN and OneClassSVM. Owner amueller added a note Dec 20, 2011 Agreed. to join this conversation on GitHub. Already have an account? Sign in to comment
commented on the diff
doc/modules/naive_bayes.rst
 ((9 lines not shown)) + +The class ``SemisupervisedNB`` implements the expectation maximization (EM) +algorithm for semisupervised training of Naive Bayes models, +where a part of the training samples are unlabeled. +Unlabeled data are indicated by a ``-1`` value in the label vector. + +This EM algorithm fits an initial model, then iteratively + + * uses the current to predict fractional class memberships; + * fits a new model on its own predictions + +until convergence. +Convergence is determined by measuring the difference +between subsequent models' parameter vectors. +Note that this differs from the typical treatment of +EM for Naive Bayes in the literature,
 Owner amueller added a note Dec 20, 2011 I think it should explicitly be "Semi-supervised Naive Bayes". to join this conversation on GitHub. Already have an account? Sign in to comment
doc/modules/naive_bayes.rst
 ((13 lines not shown)) +Unlabeled data are indicated by a ``-1`` value in the label vector. + +This EM algorithm fits an initial model, then iteratively + + * uses the current to predict fractional class memberships; + * fits a new model on its own predictions + +until convergence. +Convergence is determined by measuring the difference +between subsequent models' parameter vectors. +Note that this differs from the typical treatment of +EM for Naive Bayes in the literature, +where convergence is usually checked by computing +the log-likelihood of the model given the training samples. + +``SemisupervisedNB`` is a meta-estimator that builds upon
 Owner amueller added a note Dec 20, 2011 This needs to be changed to reflect our discussion just now. to join this conversation on GitHub. Already have an account? Sign in to comment
commented on the diff
doc/modules/naive_bayes.rst
 ((25 lines not shown)) +where convergence is usually checked by computing +the log-likelihood of the model given the training samples. + +``SemisupervisedNB`` is a meta-estimator that builds upon +a regular Naive Bayes estimator. +To use this class, construct it with an ordinary Naive Bayes model as follows:: + + >>> from sklearn.naive_bayes import MultinomialNB, SemisupervisedNB + >>> clf = SemisupervisedNB(MultinomialNB()) + >>> clf + SemisupervisedNB(estimator=MultinomialNB(alpha=1.0, fit_prior=True), + n_iter=10, relabel_all=True, tol=0.001, verbose=False) + +Then use ``clf.fit`` as usual. + +.. note::
commented on the diff
examples/semisupervised_document_classification.py
 ((180 lines not shown)) + + if opts.print_report: + print "classification report:" + print metrics.classification_report(y_test, pred, + target_names=categories) + + if opts.print_cm: + print "confusion matrix:" + print metrics.confusion_matrix(y_test, pred) + + print + return score, train_time, test_time + +print 80 * '=' +print "Baseline: fully supervised Naive Bayes" +benchmark(MultinomialNB(alpha=.01), supervised=True)
 Owner amueller added a note Dec 20, 2011 From the output it is not clear to me what the difference between multinomial nb and binary nb is. How are the features used in these two cases? Or is there something in the narrative docs about this use case? Owner larsmans added a note Dec 20, 2011 (For the record,) the predict algorithm (posterior computation) is different for the multinomial and Bernoulli event models. This is described in the narrative docs, with references: http://scikit-learn.org/dev/modules/naive_bayes.html Owner amueller added a note Dec 20, 2011 Ok. Sorry should have looked before complaining. to join this conversation on GitHub. Already have an account? Sign in to comment
commented on the diff
examples/semisupervised_document_classification.py
 @@ -0,0 +1,201 @@ +""" +=============================================== +Semisupervised classification of text documents +=============================================== + +This variation on the document classification theme (see +document_classification_20newsgroups.py) showcases semisupervised learning: +classification with training on partially unlabeled data. + +The dataset used in this example is the 20 newsgroups dataset which will be +automatically downloaded and then cached; this set is labeled, but the +labels from a random part will be removed.
 Owner amueller added a note Dec 20, 2011 I think it should be explicit that the fully supervised version are trained only on the labeled subset of the data while the semi-supervised can also use the additional unlabeled data. Owner amueller added a note Dec 20, 2011 It would be good to have a link to the narrative documentation in the docstring. to join this conversation on GitHub. Already have an account? Sign in to comment
commented on the diff
examples/semisupervised_document_classification.py
 @@ -0,0 +1,201 @@ +""" +=============================================== +Semisupervised classification of text documents +=============================================== + +This variation on the document classification theme (see +document_classification_20newsgroups.py) showcases semisupervised learning:
commented on the diff
examples/semisupervised_document_classification.py
 ((54 lines not shown)) + help="Print ten most discriminative terms per class" + " for every classifier.") + +(opts, args) = op.parse_args() +if len(args) > 0: + op.error("this script takes no arguments.") + sys.exit(1) + +print __doc__ +op.print_help() +print + + +def split_indices(y, fraction): + """Random stratified split of indices into y +
 Owner amueller added a note Dec 20, 2011 pep8: Whitespace on blank line. to join this conversation on GitHub. Already have an account? Sign in to comment
 larsmans ENH semisupervised learning in Naive Bayes (EM algorithm) 6b7058b larsmans ENH semisupervised text classification demo Based heavily on ordinary text classification example ade3df8 larsmans ENH only re-estimate unlabeled samples in EMNB As per Nigam et al. 2000. 48cb134 larsmans ENH start EMNB from supervised model instead of zeros d68986f larsmans ENH optimize EMNB: cache X and Y slicing (five-fold speed increase) 5c941ab larsmans COSMIT rename EMNB to SemisupervisedNB Less cryptic than a four-letter acronym 37e4e27 larsmans ENH relabel option to SemisupervisedNB 3893872 larsmans BUG fix Naive Bayes fit with non-empirical prior 5bdde93 larsmans BUG swap E and M steps in SemisupervisedNB 5deb9f1 larsmans ENH speed up EM for NB 10x by using linalg.norm c5f72bf larsmans DOC add SemisupervisedNB to NB narrative docs 47f9f83 larsmans give up on EM for Gaussian NB for now e0d3808 larsmans ENH stratified label deletion in semisup example + stricter tol in EM NB 5fa51b9 larsmans TST EM for NB + add check for BaseDiscreteNB fcb1142
commented on the diff
doc/modules/naive_bayes.rst
 @@ -173,3 +173,48 @@ It is advisable to evaluate both models, if time permits. `_ 3rd Conf. on Email and Anti-Spam (CEAS). + +.. _semisupervised_naive_bayes: + +Semisupervised training with EM +------------------------------- + +The class ``SemisupervisedNB`` implements the expectation maximization (EM) +algorithm for semisupervised training of Naive Bayes models, +where a part of the training samples are unlabeled. +Unlabeled data are indicated by a ``-1`` value in the label vector. + +This EM algorithm fits an initial model, then iteratively + + * uses the current to predict fractional class memberships; + * fits a new model on its own predictions
 Owner amueller added a note Dec 20, 2011 I think it should somehow say that it is related to self trained learning and link to wikipedia. to join this conversation on GitHub. Already have an account? Sign in to comment
 larsmans DOC point out similarity with self-training at @amueller's request d0e4847
Owner

I still don't fully agree with the idea of "flattening" SemisupervisedNB so as not to be a meta-estimator. One problem with this is that the class with have to duplicate the parameters of the underlying model to get a comprehensive repr:

In [2]: SemisupervisedNB("bernoulli", alpha=1, binarize=2.)
Out[2]:
SemisupervisedNB(event_model='bernoulli', n_iter=10, relabel_all=True,
tol=1e-05, verbose=False)

alpha and binarize aren't printed. In the meta-estimator case, we got this for free.

So, I suggest keeping the meta-estimator design after. I know that "Flat is better than nested", but "There should be one--and preferably only one--obvious way to do it" and "that way may not be obvious at first unless you're Dutch." ;)

Owner

Thanks :)

 larsmans DOC SemisupervisedNB in classes.rst 0ce2348 larsmans Merge branch 'master' into emnb Conflicts: sklearn/preprocessing/__init__.py be29a50 larsmans COSMIT naive_bayes pyflakes-clean 80b9c3e larsmans DOC link SemisupervisedNB 0e90277
Owner

@larsmans the Dutch argument is cheating :)

I will try to find some time to read the code / examples to make my own French / Canadian opinion on how the meta-estimator-style API feels in practice.

I think one thing is missing here: the document length normalization, which is used in the original paper.

Owner

@fannix the Vectorizer class always takes care of that by default.

Owner

I gave alternative representations of "unlabeled" some more thought, but there seems to be no more "natural" value than -1 that is representable in all relevant types. Notably, None translates to nan as an np.float, but is not representable in np.int, which is one of the obvious candidates for class label types.

Owner

@larsmans As I said in Malaga, I agree with you. It wasn't obvious to me why -1 is the best choice but we shouldn't over-complicate it ;)

Owner

@amueller, yes, I just wanted it on record here for @fannix and others :)

Owner
======================================================================
FAIL: Doctest: sklearn.preprocessing.LabelBinarizer
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/lib/python2.7/doctest.py", line 2166, in runTest
raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for sklearn.preprocessing.LabelBinarizer
File "/home/ogrisel/coding/scikit-learn/sklearn/preprocessing/__init__.py", line 458, in LabelBinarizer

----------------------------------------------------------------------
File "/home/ogrisel/coding/scikit-learn/sklearn/preprocessing/__init__.py", line 495, in sklearn.preprocessing.LabelBinarizer
Failed example:
clf.fit([1, 2, 6, 4, 2])
Expected:
LabelBinarizer()
Got:
LabelBinarizer(unlabeled=-1)
----------------------------------------------------------------------
File "/home/ogrisel/coding/scikit-learn/sklearn/preprocessing/__init__.py", line 499, in sklearn.preprocessing.LabelBinarizer
Failed example:
clf.transform([1, 6])
Expected:
array([[ 1.,  0.,  0.,  0.],
[ 0.,  0.,  0.,  1.]])
Got:
array([[ 0.25,  0.25,  0.25,  0.25],
[ 0.  ,  0.  ,  0.  ,  1.  ]])

>>  raise self.failureException(self.format_failure(<StringIO.StringIO instance at 0x46b2e60>.getvalue()))
commented on the diff
sklearn/tests/test_naive_bayes.py
 @@ -117,3 +118,15 @@ def test_sample_weight(): sample_weight=[1, 1, 4]) assert_array_equal(clf.predict([1, 0]), [1]) assert_array_almost_equal(np.exp(clf.intercept_), [1 / 3., 2 / 3.]) + + +def test_semisupervised(): + X = scipy.sparse.csr_matrix([[4, 3, 1], + [5, 2, 1], + [0, 1, 7], + [0, 1, 6]]) + y = np.array([1, -1, -1, 2]) + for clf in (BernoulliNB(), MultinomialNB()): + semi_clf = SemisupervisedNB(clf, n_iter=20, tol=1e6) + semi_clf.fit(X, y) + assert_array_equal(semi_clf.predict([[5, 0, 0], [1, 1, 4]]), [1, 2])
 Owner ogrisel added a note Dec 27, 2011 The coverage report shows that this test is a bit two easy as the convergence is reached at the first iteration (old_coef and old_intercept are not updated). Also I would update this test to check that the input of a dense array X and its CSR variant yield the same outcome (in terms of predicted proba for instance). to join this conversation on GitHub. Already have an account? Sign in to comment
commented on the diff
sklearn/naive_bayes.py
 ((65 lines not shown)) + + Returns + ------- + self : object + Returns self. + """ + + clf = self.estimator + X = atleast2d_or_csr(X) + Y = clf._label_1ofK(y) + + labeled = np.where(y != -1)[0] + if self.relabel_all: + unlabeled = np.where(y == -1)[0] + X_unlabeled = X[unlabeled, :] + Y_unlabeled = Y[unlabeled, :]
 Owner ogrisel added a note Dec 27, 2011 Y_unlabeled is not defined if relabel_all is False. This case needs a test. to join this conversation on GitHub. Already have an account? Sign in to comment
commented on the diff
sklearn/naive_bayes.py
 ((84 lines not shown)) + + clf._fit1ofK(X[labeled, :], Y[labeled, :], + sample_weight[labeled, :] if sample_weight else None, + class_prior) + old_coef = clf.coef_.copy() + old_intercept = clf.intercept_.copy() + + for i in xrange(self.n_iter): + if self.verbose: + print "Naive Bayes EM, iteration %d," % i, + + # E + if self.relabel_all: + Y = clf.predict_proba(X) + else: + Y_unlabeled[:] = clf.predict_proba(X_unlabeled)
 Owner ogrisel added a note Dec 27, 2011 Y_unlabeled seems to never be used in this loop. I guess the M-step should also test whether self.relabel_all is true or false to know which Y to use for fitting the new model. to join this conversation on GitHub. Already have an account? Sign in to comment
commented on the diff
sklearn/naive_bayes.py
 ((68 lines not shown)) + self : object + Returns self. + """ + + clf = self.estimator + X = atleast2d_or_csr(X) + Y = clf._label_1ofK(y) + + labeled = np.where(y != -1)[0] + if self.relabel_all: + unlabeled = np.where(y == -1)[0] + X_unlabeled = X[unlabeled, :] + Y_unlabeled = Y[unlabeled, :] + + n_features = X.shape[1] + tol = self.tol * n_features
 Owner ogrisel added a note Dec 27, 2011 I think the tol should also be multiplied by the mean std of the feature values (or their max absolute value) so as to make the tolerance criterion insensitive to feature re-scaling. to join this conversation on GitHub. Already have an account? Sign in to comment
Owner

About the nested constructor issue: I am still not sold to the current API. Here is another proposal: what about making the SemisupervisedNB a mixin class and generate two concrete classes SemisupervisedBernoulliNB and SemisupervisedMultinomialNB? As the current implementation is using lambda expressions to simulate some kind of inheritance to build 3 readonly properties and one method, using real inheritance through mixins might make more sense here and futhermore we would get flat constructor for free and the ability for advanced users to extend through inheritance rather than composition. WDYT?

This sounds good. I think it might be better to have a Semisupervised mixin class which can admit unlabeled data.

referenced this pull request
Closed

### Expectation Maximization for Supervised Naive Bayes Classifiers #1310

Owner

As estimator can only be BernoulliNB or MultinomialNB and the only one that will probably added in the future will be GaussianNB, I think I am +1 for keywords or what @ogrisel proposed above.

Owner

If I ever try this again I'll just rewrite the code, so I'm closing this PR.

closed this
referenced this pull request
Closed

### Feature request: Additional Clustering Algorithms:BIRCH #2690

Commits on Dec 20, 2011
1. larsmans authored
2. larsmans authored
Based heavily on ordinary text classification example
3. larsmans authored
As per Nigam et al. 2000.
4. larsmans authored
5. larsmans authored
6. larsmans authored
Less cryptic than a four-letter acronym
7. larsmans authored
8. larsmans authored
9. larsmans authored
10. larsmans authored
11. larsmans authored
12. larsmans authored
13. larsmans authored
14. larsmans authored
Commits on Dec 21, 2011
1. larsmans authored
2. larsmans authored
3. larsmans authored
Conflicts:
sklearn/preprocessing/__init__.py
4. larsmans authored
5. larsmans authored
Commits on Dec 22, 2011
1. larsmans authored
1  doc/modules/classes.rst
 @@ -694,6 +694,7 @@ Pairwise metrics naive_bayes.GaussianNB naive_bayes.MultinomialNB naive_bayes.BernoulliNB + naive_bayes.SemisupervisedNB .. _neighbors_ref:
50 doc/modules/naive_bayes.rst