-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Open
Labels
Description
Hi all! I've asked this on StackOverflow but I didn't get any answer :(. I can't understand why KNearestNeighbors is slower with n_jobs > 1. I've set the start method for multiprocessing to forkserver but it's near 100 times slower...
>>> import platform; print(platform.platform())
Linux-4.4.5-1-ARCH-x86_64-with-arch-Arch-Linux
>>> import sys; print("Python", sys.version)
Python 3.5.1 (default, Mar 3 2016, 09:29:07)
[GCC 5.3.0]
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.11.0
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 0.17.0
>>> import sklearn; print("Scikit-Learn", sklearn.__version__)
Scikit-Learn 0.17.1
>>>
I run this sample code and I post results.
#!/usr/bin/env python
import numpy as np
from sklearn import neighbors
from time import time
from sklearn import datasets
import multiprocessing as mp
if __name__ == "__main__":
mp.set_start_method('forkserver', force = True)
np.random.seed(123456)
iris = datasets.load_iris()
knn = neighbors.KNeighborsClassifier(n_neighbors = 3, n_jobs = 2)
perm = np.random.permutation(iris.target.size)
iris.data = iris.data[perm]
iris.target = iris.target[perm]
knn.fit(iris.data[:100], iris.target[:100])
start = time()
score = knn.score(iris.data[100:], iris.target[100:])
end = time()
print(score)
print(end-start)
Output:
[antonio@Antonio-Arch Práctica 1: Búsquedas por Trayectorias]$ python cv.py
0.94
0.001409292221069336 # Time with n_jobs = 1
[antonio@Antonio-Arch Práctica 1: Búsquedas por Trayectorias]$ python cv.py
0.94
0.10267972946166992 # Time with n_jobs = 2
Could anyone explain me this behaviour? I've tried with so much complex datasets and the behaviour is the same :(.
Thank you!