Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Replace range by prange when applicable #13213

Open
jeremiedbb opened this issue Feb 21, 2019 · 7 comments
Open

PERF: Replace range by prange when applicable #13213

jeremiedbb opened this issue Feb 21, 2019 · 7 comments
Labels

Comments

@jeremiedbb
Copy link
Member

@jeremiedbb jeremiedbb commented Feb 21, 2019

We have recently enabled OpenMP support in sklearn, so let's use it !

It would be worth to check if there are place in sklearn cython code where we can use a prange loop instead of a range loop for instance.

@rth

This comment has been minimized.

Copy link
Member

@rth rth commented Feb 21, 2019

Do we have a way to avoid thread over-subscription with BLAS and joblib while doing this?

@jeremiedbb

This comment has been minimized.

Copy link
Member Author

@jeremiedbb jeremiedbb commented Feb 21, 2019

Absolutely ! not on master however :/
Currently I put it in #11950 for now because there was no use elsewhere yet. But I can make a separate PR for that.

It's adapted from some code of loky. Essentially loop over dynamically shared libs to check which blas / openmp is loaded and then use the appropriate method to dynamically set the max number of threads they are allowed to use.

@rth

This comment has been minimized.

Copy link
Member

@rth rth commented Feb 21, 2019

Nice! Yes, maybe splitting it out of #11950 would make review of that one easier as well.

@jnothman

This comment has been minimized.

Copy link
Member

@jnothman jnothman commented Feb 22, 2019

Does the user need a global way to control how many processors are available beyond explicit use of n_jobs?

@jeremiedbb

This comment has been minimized.

Copy link
Member Author

@jeremiedbb jeremiedbb commented Feb 22, 2019

I'm not sure to understand. Here's my thought.

When inside code parallelized using joblib, there's no need to add a second level of parallelism with prange. At least for now, while cython only supports OpenMP. Maybe when cython supports TBB it could be worth.

Inside code not parallelized, there may be some places where we could use a prange. However, there might be some BLAS calls inside these loops. In that case, we need to control the number of threads of the BLAS to avoid oversubscription. I made a contextmanager for that, to locally control BLAS (it can be used to control openmp also but no need for now).

@thomasjpfan

This comment has been minimized.

Copy link
Member

@thomasjpfan thomasjpfan commented Feb 22, 2019

By default, joblib's 'loky' backend forces workers to use only one thread: https://joblib.readthedocs.io/en/latest/parallel.html#avoiding-over-subscription-of-cpu-ressources

To cope with this problem, joblib forces by default supported third-party libraries to use only one thread in workers with the 'loky' backend.

@rmenuet

This comment has been minimized.

Copy link
Contributor

@rmenuet rmenuet commented Feb 28, 2019

FYI, tested switching Joblib for prange in nearest neighbors tree-based queries: it did not improve performance but deactivating some debugging stats did

See #13330 and #13331

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
6 participants
You can’t perform that action at this time.