Skip to content

Why is KNearestNeighbors slower with n_jobs > 1? #6645

@29antonioac

Description

@29antonioac

Hi all! I've asked this on StackOverflow but I didn't get any answer :(. I can't understand why KNearestNeighbors is slower with n_jobs > 1. I've set the start method for multiprocessing to forkserver but it's near 100 times slower...

>>> import platform; print(platform.platform())
Linux-4.4.5-1-ARCH-x86_64-with-arch-Arch-Linux
>>> import sys; print("Python", sys.version)
Python 3.5.1 (default, Mar  3 2016, 09:29:07) 
[GCC 5.3.0]
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.11.0
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 0.17.0
>>> import sklearn; print("Scikit-Learn", sklearn.__version__)
Scikit-Learn 0.17.1
>>> 

I run this sample code and I post results.

#!/usr/bin/env python

import numpy as np
from sklearn import neighbors
from time import time
from sklearn import datasets
import multiprocessing as mp

if __name__ == "__main__":
    mp.set_start_method('forkserver', force = True)
    np.random.seed(123456)

    iris = datasets.load_iris()

    knn = neighbors.KNeighborsClassifier(n_neighbors = 3, n_jobs = 2)

    perm = np.random.permutation(iris.target.size)
    iris.data = iris.data[perm]
    iris.target = iris.target[perm]
    knn.fit(iris.data[:100], iris.target[:100])

    start = time()
    score = knn.score(iris.data[100:], iris.target[100:])
    end = time()

    print(score)
    print(end-start)

Output:

[antonio@Antonio-Arch Práctica 1: Búsquedas por Trayectorias]$ python cv.py 
0.94
0.001409292221069336 # Time with n_jobs = 1
[antonio@Antonio-Arch Práctica 1: Búsquedas por Trayectorias]$ python cv.py 
0.94
0.10267972946166992 # Time with n_jobs = 2

Could anyone explain me this behaviour? I've tried with so much complex datasets and the behaviour is the same :(.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions