Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Two fixes in NaiveBayesClassifier #224

Merged
merged 1 commit into from

4 participants

@apresta
  • Fixed a regression in prob_classify(): we cannot iterate on a dictionary and change it at the same time. Iterate on a copy of the keys instead.

  • In train(), switched from defaultdict(FreqDist) to ConditionalFreqDist()

While we're there, can someone briefly explain me the status of this GitHub repo? Is it the official development repository now?
I ask because of the regression I've found (it's fixed in the SVN repo).
Also, the nltk.compat seems to have gone. Was this intended or is it another regression?

@apresta apresta fixed a regression in NaiveBayesClassifier.prob_classify;
switched to ConditionalFreqDist in NaiveBayesClassifier.train
481e9ef
@xim xim merged commit 19dec81 into from
@xim
Collaborator

Yes, Github is the official source repository for NLTK now. The regression was a minor error done during a refactor. Thanks for the fix! ^_^

The commit that removed compat.py was deliberate:

commit 8617fa7
Author: Steven Bird stevenbird1@gmail.com
Date: Wed Nov 2 04:32:30 2011 +1100

removed support for Python 2.4
@stevenbird
Owner

Yes, that was deliberate. What's the reason for this change please: "In train(), switched from defaultdict(FreqDist) to ConditionalFreqDist()"?

@apresta

The only reason is consistency. I noticed there was a ConditionalFreqDist class that is essentially a defaultdict(FreqDist), so why not take advantage of it? This, unless I misinterpreted the original intent.

@xim
Collaborator

ConditionalFreqDist has some extra functionality that isn't used here, so it would be preferable to make it clear we're not using it - e.g. by making a defaultdict(FreqDist). Will undo that part =)

@apresta

Alright, makes sense to me. Sorry for that!

@stevenbird
Owner

Please submit a new pull request to undo this change, thanks.

@apresta

Morten already did that: ddb2543

@anirudhs2005

I am doing a project on spam classification. The first algorithm I ran was Naive Bayes Algorithm . The algorithm ran perfectly on Feb 10 and a few days after that. I haven't done any updates to the code after that. I have the snap shots of the output too.
Now, suddenly the classifier has gone awry and it pops up such errors. I thought it was a problem with Naive Bayes and I ran MaxEnt; even then the result is the same.
Could you please help me out ? I think many of them have faced this problem. What could be the solution to it?

import spambot as s
cl=s.spamclassifier()
cl.classifier()
Traceback (most recent call last):
File "", line 1, in
File "spambot.py", line 58, in classifier
classifier= NaiveBayesClassifier.train(training_set)
File "/usr/local/lib/python2.7/dist-packages/nltk/classify/naivebayes.py", line 215, in train
label_probdist = estimator(label_freqdist)
File "/usr/local/lib/python2.7/dist-packages/nltk/probability.py", line 916, in init
LidstoneProbDist.init(self, freqdist, 0.5, bins)
File "/usr/local/lib/python2.7/dist-packages/nltk/probability.py", line 802, in init
'must have at least one bin.')
ValueError: A ELE probability distribution must have at least one bin.

@stevenbird
Owner

Please submit a new issue, with a small code sample that allows us to replicate the problem, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Feb 18, 2012
  1. @apresta

    fixed a regression in NaiveBayesClassifier.prob_classify;

    apresta authored
    switched to ConditionalFreqDist in NaiveBayesClassifier.train
This page is out of date. Refresh to see the latest.
Showing with 3 additions and 3 deletions.
  1. +3 −3 nltk/classify/naivebayes.py
View
6 nltk/classify/naivebayes.py
@@ -34,7 +34,7 @@
from collections import defaultdict
-from nltk.probability import FreqDist, DictionaryProbDist, ELEProbDist, sum_logs
+from nltk.probability import FreqDist, ConditionalFreqDist, DictionaryProbDist, ELEProbDist, sum_logs
from api import ClassifierI
@@ -94,7 +94,7 @@ def prob_classify(self, featureset):
# Otherwise, we'll just assign a probability of 0 to
# everything.
featureset = featureset.copy()
- for fname in featureset:
+ for fname in featureset.keys():
for label in self._labels:
if (label, fname) in self._feature_probdist:
break
@@ -184,7 +184,7 @@ def train(labeled_featuresets, estimator=ELEProbDist):
i.e., a list of tuples ``(featureset, label)``.
"""
label_freqdist = FreqDist()
- feature_freqdist = defaultdict(FreqDist)
+ feature_freqdist = ConditionalFreqDist()
feature_values = defaultdict(set)
fnames = set()
Something went wrong with that request. Please try again.