Memory error while predicting with KNeighborsClassifier #2409

Closed
sagar81 opened this Issue Aug 30, 2013 · 2 comments

Projects

None yet

3 participants

@sagar81

Following is the piece of code that I wrote to get feature selection using RFE and estimator LinearSVC and then using the reduced data to fit and predict KNeighborClassifier.

clf = LinearSVC(C = 10, class_weight = 'auto')
rfe = RFE(estimator = clf, n_features_to_select = 700, step = 42)
rfe.fit(X, trainLabels)
reduced_train_data = rfe.transform(X)
print "reduced_train_data.shape ", reduced_train_data.shape
reduced_test_data = rfe.transform(test)
neigh = KNeighborsClassifier(n_neighbors=5, weights='distance', algorithm = 'ball_tree')
print "knn initiated"
neigh.fit(reduced_train_data, trainLabels)
print "knn fitted"
test_predict = neigh.predict(reduced_test_data)
print "knn predicted"

Following is the output: reduced_train_data.shape (42000, 700)
knn initiated
knn fitted

And then I see the following error:

Traceback (most recent call last):
  File "E:\Coursera\KaggleDataProjects\DigitRecognition\main.py", line 74, in <module>
    test_predict = neigh.predict(reduced_test_data)
  File "C:\Python27\lib\site-packages\sklearn\neighbors\classification.py", line 146, in predict
    neigh_dist, neigh_ind = self.kneighbors(X)
  File "C:\Python27\lib\site-packages\sklearn\neighbors\base.py", line 313, in kneighbors
    return_distance=return_distance)
  File "binary_tree.pxi", line 1295, in sklearn.neighbors.ball_tree.BinaryTree.query (sklearn\neighbors\ball_tree.c:9889)
  File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 91, in array2d
    X_2d = np.asarray(np.atleast_2d(X), dtype=dtype, order=order)
  File "C:\Python27\lib\site-packages\numpy\core\numeric.py", line 320, in asarray
    return array(a, dtype, copy=False, order=order)
MemoryError

This error does not happen everytime I run the code by slightly changing the parameter. Can some one confirm whether this is a bug or something else is going on here..

Initial dimension of train data (X) = 42000, 784
Initial dimension of test data (test) = 28000, 784

I am using version 0.14.1

@kastnerkyle
scikit-learn member

How much RAM does your system have? Can you provide examples of a script which does not cause the error with the same data?

Also, can you try to recreate th error using something from sklearn.datasets? It makes debugging a little easier.

@amueller amueller added this to the 0.15.1 milestone Jul 18, 2014
@amueller
scikit-learn member

Closing for lack of feedback.

@amueller amueller closed this Jan 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment