Errors in NearestNeighborLearner

```
What steps will reproduce the problem?
Tried to use the learning.NearestNeighborLearner on the Sex Classification 
dataset from this Wikipedia article on Naive Bayes classifiers: 
http://en.wikipedia.org/wiki/Naive_Bayes_classifier#Sex_Classification

What is the expected output? What do you see instead?
Program wouldn't run due to bugs in the implementation of NNLearner

What version of the product are you using?
Bug exists in r30


Please provide any additional information below.
Here's my sample code:

import learning

examples = 
[[6,180,12,'male'],[5.92,190,11,'male'],[5.58,170,12,'male'],[5,100,6,'female'],
[5.5,150,8,'female'],[5.42,130,7,'female'],[5.75,150,9,'female']]

ds = learning.DataSet(examples)
nnl = learning.NearestNeighborLearner(2)
nnl.train(ds)
print nnl.predict([5.1,105,6.3])

And I would expect it to print 'female'.

I believe the following fixes should work:
old learning.py, lines 217 - 231

        else:
            ## Maintain a sorted list of (distance, example) pairs.
            ## For very large k, a PriorityQueue would be better
            best = [] 
            for e in examples:
                d = self.distance(e, example)
                if len(best) < k: 
                    e.append((d, e))
                elif d < best[-1][0]:
                    best[-1] = (d, e)
                    best.sort()
            return mode([e[self.dataset.target] for (d, e) in best])

    def distance(self, e1, e2):
        return mean_boolean_error(e1, e2)


new learning.py:

        else:
            ## Maintain a sorted list of (distance, example) pairs.
            ## For very large k, a PriorityQueue would be better
            best = [] 
            for e in self.dataset.examples:
                d = self.distance(e, example)
                if len(best) < self.k: 
                    best.append((d, e))
                elif d < best[-1][0]:
                    best[-1] = (d, e)
                    best.sort()
            return mode([e[self.dataset.target] for (d, e) in best])

    def distance(self, e1, e2):
        return mean_error(e1, e2)


Specifically:
1) changed 'examples' to self.dataset.examples. 
2) changed e.append((d,e)) to best.append((d, e))
3) and I could be wrong, but I believe you wanted mean_error, not 
mean_boolean_error in your distance function.

For the gender classification example, it seems to work great. Thanks!
```

Original issue reported on code.google.com by `tblana...@gmail.com` on 19 Oct 2010 at 5:21


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Errors in NearestNeighborLearner #21

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Errors in NearestNeighborLearner #21

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions