RadiusNeighborsClassifier: possible problem with outliers class? #6902
Comments
Readability counts, a lot! Please use triple back-quotes aka fenced code blocks to format error messages and code snippets. Could you try to provide a snippet that reproduces the problem? |
Sorry for the readability, I've modified that. Here is a snippet that reproduces the problem most of the time (when the algorithm finds an outlier I think): from sklearn import neighbors
import numpy as np
from sklearn.cross_validation import StratifiedKFold
########################################################### RADIUS NEIGHBORS
# Construction of the distance matrix
X = np.random.rand(100,100)
for i in range(0, len(X)):
X[i,i]=0
for j in range(0, len(X)):
X[i,j] = X[j,i]
y = np.random.randint(2, size=100)
clf = neighbors.RadiusNeighborsClassifier(radius=0.01, weights='distance', metric='precomputed', outlier_label=2)
# Cross validation
skf = StratifiedKFold(y,5)
scores = []
for train, test in skf:
X_train=X[train,:]
X_train=X_train[:,train]
X_test=X[test,:]
X_test=X_test[:,train]
y_train=y[train]
y_test=y[test]
clf.fit(X_train,y_train)
scores.append(clf.score(X_test, y_test))
scores = np.array(scores)
print "Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2) ====> Traceback (most recent call last):
File "<ipython-input-17-cbcf29e6583f>", line 36, in <module>
scores.append(clf.score(X_test, y_test))
File "/volatile/anaconda2/lib/python2.7/site-packages/sklearn/base.py", line 310, in score
return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
File "/volatile/anaconda2/lib/python2.7/site-packages/sklearn/neighbors/classification.py", line 379, in predict
in zip(pred_labels[inliers], weights)],
File "/volatile/anaconda2/lib/python2.7/site-packages/sklearn/utils/extmath.py", line 404, in weighted_mode
w = np.zeros(a.shape, dtype=w.dtype) + w
ValueError: operands could not be broadcast together with shapes (2,) (3,) |
I can replicate (using random seed 0), and I agree wth your solution. A PR with a test would be very welcome. |
I found that setting a seed to a random a number is always resulting in an error but the code works fine for odd values |
I think it's because there is no outliers found during the cross validation for these values of the seed. Maybe it's only a coincidence? I found that the code is still not working for seeds set to 3 or 7. |
Hi,
When I use the RadiusNeighborsClassifier with an outlier class, I get this message :
I'm wondering if the line
in zip(pred_labels[inliers], weights)]
needs to be replaced within zip(pred_labels[inliers], weights[inliers])]
.I'm using the version 0.17.1 of sklearn.
The text was updated successfully, but these errors were encountered: