Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Adding n_jobs to kernel_approximation.Nystroem #18545

Merged
merged 5 commits into from Oct 12, 2020

Conversation

LaurenzReitsam
Copy link
Contributor

Reference Issues/PRs

What does this implement/fix? Explain your changes.

  • Adding n_jobs for parallel calculation of kernel matrix to Nystroem class.
  • Removing unnecessary attribute component_indices_ from Nystroem class

Any other comments?

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @LaurenzReitsam !

Can you provide a comment with a benchmark script and the corresponding results to see how different values of n_jobs improves performance of Nystroem?

sklearn/kernel_approximation.py Outdated Show resolved Hide resolved
@LaurenzReitsam
Copy link
Contributor Author

LaurenzReitsam commented Oct 6, 2020

@thomasjpfan thanks for the quick review!

Here is a benchmark on my local machine:

import time
import numpy as np

X = np.random.rand(100000, 50)

for jobs in range(1, 11):

    tick = time.time()
    Nystroem(n_components=3000, n_jobs=jobs).fit_transform(X)
    tock = time.time()
   print("calculation time with {} jobs: {:3.3f} sec".format(jobs, tock-tick))

This yield the following results:

calculation time with 1 jobs: 26.544 sec
calculation time with 2 jobs: 23.215 sec
calculation time with 3 jobs: 21.303 sec
calculation time with 4 jobs: 20.184 sec
calculation time with 5 jobs: 20.247 sec
calculation time with 6 jobs: 19.733 sec
calculation time with 7 jobs: 19.594 sec
calculation time with 8 jobs: 19.137 sec
calculation time with 9 jobs: 18.562 sec
calculation time with 10 jobs: 19.326 sec

The effect is increasing with the size of n_components while fitting. But the bigger impact is in the transform() function where a n_samples x n_components matrix is calculated with n_samples >> n_components.

@ogrisel
Copy link
Member

ogrisel commented Oct 7, 2020

How many physical and logical cores do you have on your machine?

@LaurenzReitsam
Copy link
Contributor Author

@ogrisel 8 physical cores, 16 cores in total.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a tiny bit of missing doc for this change (see below):

sklearn/kernel_approximation.py Show resolved Hide resolved
@LaurenzReitsam LaurenzReitsam changed the title Adding n_jobs to kernel_approximation.Nystroem [MRG] Adding n_jobs to kernel_approximation.Nystroem Oct 7, 2020
@thomasjpfan
Copy link
Member

There are two levels of parallelism here: n_jobs threading and OPENBLAS_NUM_THREADS. After n_jobs=3, I see that timing performance does change. On my 12 core (24 threads) workstation:

OPENBLAS_NUM_THREADS=12

calculation time with 1 jobs: 13.170 sec
calculation time with 2 jobs: 11.495 sec
calculation time with 3 jobs: 10.851 sec
calculation time with 4 jobs: 10.451 sec
calculation time with 5 jobs: 10.062 sec
calculation time with 6 jobs: 9.981 sec
calculation time with 7 jobs: 10.075 sec
calculation time with 8 jobs: 10.167 sec
calculation time with 9 jobs: 10.113 sec
calculation time with 10 jobs: 10.014 sec
calculation time with 11 jobs: 10.034 sec
calculation time with 12 jobs: 9.983 sec

OPENBLAS_NUM_THREADS=6

calculation time with 1 jobs: 14.889 sec
calculation time with 2 jobs: 13.382 sec
calculation time with 3 jobs: 12.758 sec
calculation time with 4 jobs: 12.493 sec
calculation time with 5 jobs: 12.334 sec
calculation time with 6 jobs: 12.226 sec
calculation time with 7 jobs: 12.172 sec
calculation time with 8 jobs: 12.212 sec
calculation time with 9 jobs: 12.146 sec
calculation time with 10 jobs: 12.132 sec
calculation time with 11 jobs: 12.093 sec
calculation time with 12 jobs: 12.056 sec

OPENBLAS_NUM_THREADS=2

calculation time with 1 jobs: 25.395 sec
calculation time with 2 jobs: 23.863 sec
calculation time with 3 jobs: 23.167 sec
calculation time with 4 jobs: 22.895 sec
calculation time with 5 jobs: 22.718 sec
calculation time with 6 jobs: 22.632 sec
calculation time with 7 jobs: 22.594 sec
calculation time with 8 jobs: 22.573 sec
calculation time with 9 jobs: 22.836 sec
calculation time with 10 jobs: 22.489 sec
calculation time with 11 jobs: 22.640 sec
calculation time with 12 jobs: 23.023 sec

Given this, I think adding n_jobs is okay here since it does speed up computation (up to a point).

@ogrisel
Copy link
Member

ogrisel commented Oct 9, 2020

Thanks for the extensive benchmark @thomasjpfan . Can you also do the same without setting the OPENMP_NUM_THREADS env variable at all to see of the joblib / openmp default behavior works without over-subscription?

@thomasjpfan
Copy link
Member

Can you also do the same without setting the OPENMP_NUM_THREADS env variable at all to see of the joblib / openmp default behavior works without over-subscription?

Without setting OPENMP_NUM_THREADS:

calculation time with 1 jobs: 12.886 sec
calculation time with 2 jobs: 11.674 sec
calculation time with 3 jobs: 10.623 sec
calculation time with 4 jobs: 11.040 sec
calculation time with 5 jobs: 10.479 sec
calculation time with 6 jobs: 10.280 sec
calculation time with 7 jobs: 10.147 sec
calculation time with 8 jobs: 9.958 sec
calculation time with 9 jobs: 10.012 sec
calculation time with 10 jobs: 10.086 sec
calculation time with 11 jobs: 10.635 sec
calculation time with 12 jobs: 10.023 sec

@ogrisel
Copy link
Member

ogrisel commented Oct 12, 2020

Great. So the improvement is not dramatic but it does never do worse that with n_jobs=1. So merge?

@thomasjpfan thomasjpfan merged commit fdb9233 into scikit-learn:master Oct 12, 2020
5 checks passed
amrcode pushed a commit to amrcode/scikit-learn that referenced this pull request Oct 19, 2020
* Update kernel_approximation.py

component_indices_ is not needed

* adding n_jobs to nystroem

* adding description for n_jobs for nystroem

* undo changes of component_indices_

* n_jobs: adding more detailed description
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
* Update kernel_approximation.py

component_indices_ is not needed

* adding n_jobs to nystroem

* adding description for n_jobs for nystroem

* undo changes of component_indices_

* n_jobs: adding more detailed description
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants