Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreaded kneighbors_graph in TSNE #15082

Merged
merged 6 commits into from Oct 2, 2019
Merged

Conversation

@rth
Copy link
Member

rth commented Sep 24, 2019

Partially addresses #10044
Partially addresses #10023

Computing the kneighbors_graph is one of the bottlenecks in TSNE. Actually we computing it in parallel, including for kd_tree and ball_tree has been supported for a while. This adds the n_jobs parameter to TSNE, which will pass it to NearestNeighbours.

On the first 20k samples of MNIST

import numpy as np
from sklearn.manifold import TSNE
from sklearn.datasets import fetch_openml
from joblib import Memory


memory = Memory('/tmp/joblib_cache')

mnist = memory.cache(fetch_openml)(data_id=41063)

TSNE(verbose=5, n_jobs=8).fit_transform(mnist.data[:20000])

we can get up to a 2.5x run time speedup when running with n_jobs=8,

MNIST n_samples=20k / n_jobs 1 4 8 12
kneighbors_graph (s) 340 130 86 80
total run time (s) 462 246 189 184

The strong scaling limit seem to be around 10 CPU cores in this example.

For the full MNIST (60k samples), k-neighbours search takes proportionally even longer, and therefore multi-core scalability is better, with a x3.0 speed up at n_jobs=8 (and x3.7 at n_jobs=12) ,

MNIST n_samples=60k / n_jobs 1 8 12
kneighbors_graph (min) 61 15 10
total run time (min) 68 22.4 18.5

#10044 (comment) proposed smarter improvements that are still worth investigating, but in any case NN search is a bottleneck and we can't use annoy or similar unlike other TSNE libraries.

Not sure if this needs additional tests given that n_jobs in NearesNeigbours is already extensively tested.

@rth rth mentioned this pull request Sep 24, 2019
@rth rth changed the title Parallel kneighbors_graph in TSNE Multi-threaded kneighbors_graph in TSNE Sep 24, 2019
@rth rth changed the title Multi-threaded kneighbors_graph in TSNE Multithreaded kneighbors_graph in TSNE Sep 24, 2019
@rth

This comment has been minimized.

Copy link
Member Author

rth commented Sep 24, 2019

In the end using the algorithm='brute' on MNIST computes NN in 1min, and runs fully in 9min. We really need better heuristics for algorithm='auto' (#8213).

Though, this PR is still relevant, as for less high dimensional data kd_tree or ball_tree could be the right choice.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Sep 24, 2019

@rth

This comment has been minimized.

Copy link
Member Author

rth commented Sep 24, 2019

We could also present an example of how to use annoy?

You mean with precomputed distance sparse distance matrices? #10482 added examples/neighbors/approximate_nearest_neighbors.py with exactly that.

The point of this PR is however that we can be significantly faster than what we are now for TSNE even without using external dependencies.

Copy link
Member

jnothman left a comment

Smoke test??

The number of parallel jobs to run for neighbors search.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

This comment has been minimized.

Copy link
@jnothman

jnothman Sep 24, 2019

Member

Worth noting that it's unused if metric is precomputed?

@TomDLT
TomDLT approved these changes Sep 25, 2019
@@ -692,9 +700,11 @@ def _fit(self, X, skip_num_points=0):

if self.metric == "euclidean":
distances = pairwise_distances(X, metric=self.metric,
squared=True)
squared=True,
n_jobs=self.n_jobs)

This comment has been minimized.

Copy link
@jeremiedbb

jeremiedbb Sep 26, 2019

Member

for euclidean metric, having n_jobs > 1 unless blas num threads has been limited.
Could you rerun your benchmark with this n_jobs removed ?

This comment has been minimized.

Copy link
@rth

rth Sep 26, 2019

Author Member

Yes, of course you are right, somehow I missed that in this branch the metric is always euclidean. Reverted the change here. The n_jobs is only useful in pairwise_distances for other computationally expensive metrics. In fact I think we should raise a warning when pairwise_distances is used with an euclidean metric and n_jobs>1.

The benchmarks I have done above are only for method="barnes_hut" (default). I have added the n_jobs for pairwise_distances in method="exact" on general grounds as there are probably some metrics where it's helpful.

This comment has been minimized.

Copy link
@rth

rth Sep 26, 2019

Author Member

The benchmarks I have done above are only for method="barnes_hut" (default). I have added the n_jobs for pairwise_distances in method="exact" on general grounds as there are probably some metrics where it's helpful.

My point is that this has no impact on benchmarks above.

@rth

This comment has been minimized.

Copy link
Member Author

rth commented Sep 26, 2019

OK, clarified the docstring, added a basic test, and a what's new.

rth added 2 commits Sep 26, 2019
@rth

This comment has been minimized.

Copy link
Member Author

rth commented Sep 30, 2019

Should be good to merge unless there are other comments?

@jnothman jnothman merged commit 4ca6ee4 into scikit-learn:master Oct 2, 2019
19 checks passed
19 checks passed
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python 7 new alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc artifact Link to 0/doc/_changed.html
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 96.76%)
Details
codecov/project 96.76% (+<.01%) compared to 21fc1d9
Details
scikit-learn.scikit-learn Build #20190926.31 succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_ubuntu_atlas) Linux py35_ubuntu_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_conda_mkl) Linux pylatest_conda_mkl succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_pip_openblas_pandas) Linux pylatest_pip_openblas_pandas succeeded
Details
scikit-learn.scikit-learn (Linux32 py35_ubuntu_atlas_32bit) Linux32 py35_ubuntu_atlas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py35_pip_openblas_32bit) Windows py35_pip_openblas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl succeeded
Details
@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Oct 2, 2019

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.