Skip to content


Subversion checkout URL

You can clone with
Download ZIP


Memory error while predicting with KNeighborsClassifier #2409

sagar81 opened this Issue · 2 comments

3 participants


Following is the piece of code that I wrote to get feature selection using RFE and estimator LinearSVC and then using the reduced data to fit and predict KNeighborClassifier.

clf = LinearSVC(C = 10, class_weight = 'auto')
rfe = RFE(estimator = clf, n_features_to_select = 700, step = 42), trainLabels)
reduced_train_data = rfe.transform(X)
print "reduced_train_data.shape ", reduced_train_data.shape
reduced_test_data = rfe.transform(test)
neigh = KNeighborsClassifier(n_neighbors=5, weights='distance', algorithm = 'ball_tree')
print "knn initiated", trainLabels)
print "knn fitted"
test_predict = neigh.predict(reduced_test_data)
print "knn predicted"

Following is the output: reduced_train_data.shape (42000, 700)
knn initiated
knn fitted

And then I see the following error:

Traceback (most recent call last):
  File "E:\Coursera\KaggleDataProjects\DigitRecognition\", line 74, in <module>
    test_predict = neigh.predict(reduced_test_data)
  File "C:\Python27\lib\site-packages\sklearn\neighbors\", line 146, in predict
    neigh_dist, neigh_ind = self.kneighbors(X)
  File "C:\Python27\lib\site-packages\sklearn\neighbors\", line 313, in kneighbors
  File "binary_tree.pxi", line 1295, in sklearn.neighbors.ball_tree.BinaryTree.query (sklearn\neighbors\ball_tree.c:9889)
  File "C:\Python27\lib\site-packages\sklearn\utils\", line 91, in array2d
    X_2d = np.asarray(np.atleast_2d(X), dtype=dtype, order=order)
  File "C:\Python27\lib\site-packages\numpy\core\", line 320, in asarray
    return array(a, dtype, copy=False, order=order)

This error does not happen everytime I run the code by slightly changing the parameter. Can some one confirm whether this is a bug or something else is going on here..

Initial dimension of train data (X) = 42000, 784
Initial dimension of test data (test) = 28000, 784

I am using version 0.14.1


How much RAM does your system have? Can you provide examples of a script which does not cause the error with the same data?

Also, can you try to recreate th error using something from sklearn.datasets? It makes debugging a little easier.

@amueller amueller added this to the 0.15.1 milestone

Closing for lack of feedback.

@amueller amueller closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.