Skip to content

Conversation

jeremiedbb
Copy link
Member

Probably the reason of the performance issue reported in INRIA/scikit-learn-mooc#586

in for thread_idx in prange(n_threads, schedule='static', chunksize=1), n_threads is the number of tasks but the number of threads stays unspecified. Like this, the number of tasks might be smaller than the number of threads, but openmp will still spawn all threads even if some of them are idle.

@ogrisel
Copy link
Member

ogrisel commented Feb 21, 2022

Thanks, I wanted to test if that would fix the performance problem on my local machine but unfortunately, I found another bug :)

I reported it here: joblib/loky#354. It should be quite easy to fix.

@ogrisel
Copy link
Member

ogrisel commented Feb 21, 2022

FYI I am currently building this branch in my user container the MOOC platform to give it a try.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I confirm this fixed the over-subscription problem observed on the MOOC platform!

Please add a changelog entry to document the fix.

@jeremiedbb jeremiedbb added the Quick Review For PRs that are quick to review label Feb 21, 2022
@lesteve
Copy link
Member

lesteve commented Feb 21, 2022

Does that mean that each time prange is used it should be using num_threads or be called within a with cython.parallel(num_threads=...) block?

@jeremiedbb
Copy link
Member Author

Does that mean that each time prange is used it should be using num_threads or be called within a with cython.parallel(num_threads=...) block?

I think so because the num_threads we are passing comes from _openmp_effective_n_threads which takes cgroups quota into account.

@lesteve
Copy link
Member

lesteve commented Feb 21, 2022

OK, I am not 100% sure but it seems like there are other places where we don't do that, e.g.

for i in prange(n_samples, schedule='static', nogil=True):

This is not using prange(num_threads=...) I tried to follow the code and I don't think this function is called within a cython.parallel(num_thread=...) but I may be wrong. There may be more places.

I was looking visually at git grep with context to try to find potential problems:

git grep -n -C3 prange

@jeremiedbb
Copy link
Member Author

Yes this one should also set num_threads. Let's do that in another PR since this one is specific to HGBT

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be the right thing to do, but couldn't test it.

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
@lesteve lesteve merged commit 5c7dac0 into scikit-learn:main Feb 21, 2022
@lesteve
Copy link
Member

lesteve commented Feb 21, 2022

Merging thanks!

thomasjpfan pushed a commit to thomasjpfan/scikit-learn that referenced this pull request Mar 1, 2022
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants