New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RadiusNeighborsClassifier: possible problem with outliers class? #6902

Closed
LeonieBorne opened this Issue Jun 17, 2016 · 5 comments

Comments

Projects
None yet
4 participants
@LeonieBorne
Contributor

LeonieBorne commented Jun 17, 2016

Hi,

When I use the RadiusNeighborsClassifier with an outlier class, I get this message :

Traceback (most recent call last):

  File "<ipython-input-238-0224ec628811>", line 17, in <module>
    scores.append(clf.score(X_test, y_test))

  File "/volatile/anaconda2/lib/python2.7/site-packages/sklearn/base.py", line 310, in score
    return accuracy_score(y, self.predict(X), sample_weight=sample_weight)

  File "/volatile/anaconda2/lib/python2.7/site-packages/sklearn/neighbors/classification.py", line 379, in predict
    in zip(pred_labels[inliers], weights)],

  File "/volatile/anaconda2/lib/python2.7/site-packages/sklearn/utils/extmath.py", line 404, in weighted_mode
    w = np.zeros(a.shape, dtype=w.dtype) + w

ValueError: operands could not be broadcast together with shapes (101,) (82,) 

I'm wondering if the line in zip(pred_labels[inliers], weights)]needs to be replaced with in zip(pred_labels[inliers], weights[inliers])].
I'm using the version 0.17.1 of sklearn.

@LeonieBorne LeonieBorne changed the title from RadiusNearestNeighborsClassifier: possible problem with outliers class? to RadiusNeighborsClassifier: possible problem with outliers class? Jun 17, 2016

@lesteve

This comment has been minimized.

Member

lesteve commented Jun 17, 2016

Readability counts, a lot! Please use triple back-quotes aka fenced code blocks to format error messages and code snippets.

Could you try to provide a snippet that reproduces the problem?

@LeonieBorne

This comment has been minimized.

Contributor

LeonieBorne commented Jun 20, 2016

Sorry for the readability, I've modified that.

Here is a snippet that reproduces the problem most of the time (when the algorithm finds an outlier I think):

from sklearn import neighbors
import numpy as np
from sklearn.cross_validation import StratifiedKFold

########################################################### RADIUS NEIGHBORS
    # Construction of the distance matrix
X = np.random.rand(100,100)
for i in range(0, len(X)):
    X[i,i]=0
    for j in range(0, len(X)):
        X[i,j] = X[j,i]

y = np.random.randint(2, size=100)

clf = neighbors.RadiusNeighborsClassifier(radius=0.01, weights='distance', metric='precomputed', outlier_label=2)

# Cross validation
skf = StratifiedKFold(y,5)
scores = []
for train, test in skf:
    X_train=X[train,:] 
    X_train=X_train[:,train] 
    X_test=X[test,:]
    X_test=X_test[:,train]
    y_train=y[train]
    y_test=y[test]
    clf.fit(X_train,y_train)
    scores.append(clf.score(X_test, y_test)) 

scores = np.array(scores)
print "Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2)

====>

Traceback (most recent call last):

  File "<ipython-input-17-cbcf29e6583f>", line 36, in <module>
    scores.append(clf.score(X_test, y_test))

  File "/volatile/anaconda2/lib/python2.7/site-packages/sklearn/base.py", line 310, in score
    return accuracy_score(y, self.predict(X), sample_weight=sample_weight)

  File "/volatile/anaconda2/lib/python2.7/site-packages/sklearn/neighbors/classification.py", line 379, in predict
    in zip(pred_labels[inliers], weights)],

  File "/volatile/anaconda2/lib/python2.7/site-packages/sklearn/utils/extmath.py", line 404, in weighted_mode
    w = np.zeros(a.shape, dtype=w.dtype) + w

ValueError: operands could not be broadcast together with shapes (2,) (3,) 
@jnothman

This comment has been minimized.

Member

jnothman commented Jun 20, 2016

I can replicate (using random seed 0), and I agree wth your solution. A PR with a test would be very welcome.

@jnothman jnothman added Bug Easy labels Jun 20, 2016

@kris-singh

This comment has been minimized.

kris-singh commented Jun 21, 2016

I found that setting a seed to a random a number is always resulting in an error but the code works fine for odd values

@LeonieBorne

This comment has been minimized.

Contributor

LeonieBorne commented Jun 22, 2016

I think it's because there is no outliers found during the cross validation for these values of the seed. Maybe it's only a coincidence? I found that the code is still not working for seeds set to 3 or 7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment